* [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle
@ 2021-10-01 23:22 Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
` (4 more replies)
0 siblings, 5 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime. This simple
pattern ties all the state (vfio, dev, and cdev) together in one memory
structure and uses container_of() to navigate between the layers.
This is a followup to the discussion here:
https://lore.kernel.org/kvm/20210921155705.GN327412@nvidia.com/
This builds on Christoph's work to revise how the vfio_group works and is
against the latest VFIO tree.
Jason Gunthorpe (5):
vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
vfio: Do not open code the group list search in vfio_create_group()
vfio: Don't leak a group reference if the group already exists
vfio: Use a refcount_t instead of a kref in the vfio_group
vfio: Use cdev_device_add() instead of device_create()
drivers/vfio/vfio.c | 363 +++++++++++++++++---------------------------
1 file changed, 140 insertions(+), 223 deletions(-)
base-commit: d9a0cd510c3383b61db6f70a84e0c3487f836a63
--
2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
` (2 more replies)
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
` (3 subsequent siblings)
4 siblings, 3 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.
When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.
Prior to vfio_create_group() unlocking and returning we have
container_users == 0
device_list == empty
And this cannot change until the mutex is unlocked.
After the kref goes to zero we must also have
container_users == 0
device_list == empty
Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.
These two facts are important because when checking each operation we see:
- IOMMU_GROUP_NOTIFY_ADD_DEVICE
Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
0 container_users ends the call
Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.
Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.
This allows deleting all of the deferred group put code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 89 +++++----------------------------------------
1 file changed, 9 insertions(+), 80 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 08b27b64f0f935..32a53cb3598524 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
static void vfio_group_unlock_and_free(struct vfio_group *group)
{
+ struct vfio_unbound_dev *unbound, *tmp;
+
mutex_unlock(&vfio.group_lock);
/*
* Unregister outside of lock. A spurious callback is harmless now
* that the group is no longer in vfio.group_list.
*/
iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+
+ list_for_each_entry_safe(unbound, tmp,
+ &group->unbound_list, unbound_next) {
+ list_del(&unbound->unbound_next);
+ kfree(unbound);
+ }
kfree(group);
}
@@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
group->nb.notifier_call = vfio_iommu_group_notifier;
- /*
- * blocking notifiers acquire a rwsem around registering and hold
- * it around callback. Therefore, need to register outside of
- * vfio.group_lock to avoid A-B/B-A contention. Our callback won't
- * do anything unless it can find the group in vfio.group_list, so
- * no harm in registering early.
- */
ret = iommu_group_register_notifier(iommu_group, &group->nb);
if (ret) {
kfree(group);
@@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
static void vfio_group_release(struct kref *kref)
{
struct vfio_group *group = container_of(kref, struct vfio_group, kref);
- struct vfio_unbound_dev *unbound, *tmp;
struct iommu_group *iommu_group = group->iommu_group;
WARN_ON(!list_empty(&group->device_list));
+ WARN_ON(atomic_read(&group->container_users));
WARN_ON(group->notifier.head);
- list_for_each_entry_safe(unbound, tmp,
- &group->unbound_list, unbound_next) {
- list_del(&unbound->unbound_next);
- kfree(unbound);
- }
-
device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
list_del(&group->vfio_next);
vfio_free_group_minor(group->minor);
@@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
}
-struct vfio_group_put_work {
- struct work_struct work;
- struct vfio_group *group;
-};
-
-static void vfio_group_put_bg(struct work_struct *work)
-{
- struct vfio_group_put_work *do_work;
-
- do_work = container_of(work, struct vfio_group_put_work, work);
-
- vfio_group_put(do_work->group);
- kfree(do_work);
-}
-
-static void vfio_group_schedule_put(struct vfio_group *group)
-{
- struct vfio_group_put_work *do_work;
-
- do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
- if (WARN_ON(!do_work))
- return;
-
- INIT_WORK(&do_work->work, vfio_group_put_bg);
- do_work->group = group;
- schedule_work(&do_work->work);
-}
-
/* Assume group_lock or group reference is held */
static void vfio_group_get(struct vfio_group *group)
{
kref_get(&group->kref);
}
-/*
- * Not really a try as we will sleep for mutex, but we need to make
- * sure the group pointer is valid under lock and get a reference.
- */
-static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
-{
- struct vfio_group *target = group;
-
- mutex_lock(&vfio.group_lock);
- list_for_each_entry(group, &vfio.group_list, vfio_next) {
- if (group == target) {
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
- return group;
- }
- }
- mutex_unlock(&vfio.group_lock);
-
- return NULL;
-}
-
static
struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
{
@@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
struct device *dev = data;
struct vfio_unbound_dev *unbound;
- /*
- * Need to go through a group_lock lookup to get a reference or we
- * risk racing a group being removed. Ignore spurious notifies.
- */
- group = vfio_group_try_get(group);
- if (!group)
- return NOTIFY_OK;
-
switch (action) {
case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
vfio_group_nb_add_dev(group, dev);
@@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
mutex_unlock(&group->unbound_lock);
break;
}
-
- /*
- * If we're the last reference to the group, the group will be
- * released, which includes unregistering the iommu group notifier.
- * We hold a read-lock on that notifier list, unregistering needs
- * a write-lock... deadlock. Release our reference asynchronously
- * to avoid that situation.
- */
- vfio_group_schedule_put(group);
return NOTIFY_OK;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
2021-10-12 6:37 ` Tian, Kevin
2021-10-12 8:52 ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
` (2 subsequent siblings)
4 siblings, 2 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
that vfio_create_group() can call it to consolidate this duplicated code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 32a53cb3598524..1cb12033b02240 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
/**
* Group objects - create, release, get, put, search
*/
+static struct vfio_group *
+__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+ struct vfio_group *group;
+
+ list_for_each_entry(group, &vfio.group_list, vfio_next) {
+ if (group->iommu_group == iommu_group) {
+ vfio_group_get(group);
+ return group;
+ }
+ }
+ return NULL;
+}
+
+static struct vfio_group *
+vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+ struct vfio_group *group;
+
+ mutex_lock(&vfio.group_lock);
+ group = __vfio_group_get_from_iommu(iommu_group);
+ mutex_unlock(&vfio.group_lock);
+ return group;
+}
+
static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
enum vfio_group_type type)
{
- struct vfio_group *group, *tmp;
+ struct vfio_group *group, *existing_group;
struct device *dev;
int ret, minor;
@@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
mutex_lock(&vfio.group_lock);
/* Did we race creating this group? */
- list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
- if (tmp->iommu_group == iommu_group) {
- vfio_group_get(tmp);
- vfio_group_unlock_and_free(group);
- return tmp;
- }
+ existing_group = __vfio_group_get_from_iommu(iommu_group);
+ if (existing_group) {
+ vfio_group_unlock_and_free(group);
+ return existing_group;
}
minor = vfio_alloc_group_minor(group);
@@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
kref_get(&group->kref);
}
-static
-struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
-{
- struct vfio_group *group;
-
- mutex_lock(&vfio.group_lock);
- list_for_each_entry(group, &vfio.group_list, vfio_next) {
- if (group->iommu_group == iommu_group) {
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
- return group;
- }
- }
- mutex_unlock(&vfio.group_lock);
-
- return NULL;
-}
-
static struct vfio_group *vfio_group_get_from_minor(int minor)
{
struct vfio_group *group;
--
2.33.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
4 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.
Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().
This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.
Fixes: cba3345cc494 ("vfio: VFIO core")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1cb12033b02240..bf233943dc992f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
list_del(&unbound->unbound_next);
kfree(unbound);
}
+ iommu_group_put(group->iommu_group);
kfree(group);
}
@@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
atomic_set(&group->opened, 0);
init_waitqueue_head(&group->container_q);
group->iommu_group = iommu_group;
+ /* put in vfio_group_unlock_and_free() */
+ iommu_group_ref_get(iommu_group);
group->type = type;
BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
@@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
ret = iommu_group_register_notifier(iommu_group, &group->nb);
if (ret) {
- kfree(group);
- return ERR_PTR(ret);
+ group = ERR_PTR(ret);
+ goto err_put_group;
}
mutex_lock(&vfio.group_lock);
@@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
mutex_unlock(&vfio.group_lock);
+err_put_group:
+ iommu_group_put(iommu_group);
+ kfree(group);
return group;
}
@@ -439,7 +445,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
static void vfio_group_release(struct kref *kref)
{
struct vfio_group *group = container_of(kref, struct vfio_group, kref);
- struct iommu_group *iommu_group = group->iommu_group;
WARN_ON(!list_empty(&group->device_list));
WARN_ON(atomic_read(&group->container_users));
@@ -449,7 +454,6 @@ static void vfio_group_release(struct kref *kref)
list_del(&group->vfio_next);
vfio_free_group_minor(group->minor);
vfio_group_unlock_and_free(group);
- iommu_group_put(iommu_group);
}
static void vfio_group_put(struct vfio_group *group)
@@ -734,7 +738,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
ret = PTR_ERR(group);
goto out_remove_device;
}
-
+ iommu_group_put(iommu_group);
return group;
out_remove_device:
@@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
/* a newly created vfio_group keeps the reference. */
group = vfio_create_group(iommu_group, VFIO_IOMMU);
- if (IS_ERR(group))
- goto out_put;
- return group;
-
out_put:
iommu_group_put(iommu_group);
return group;
--
2.33.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (2 preceding siblings ...)
2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
` (2 more replies)
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
4 siblings, 3 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.
The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index bf233943dc992f..dbe7edd88ce35c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -69,7 +69,7 @@ struct vfio_unbound_dev {
};
struct vfio_group {
- struct kref kref;
+ refcount_t users;
int minor;
atomic_t container_users;
struct iommu_group *iommu_group;
@@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
if (!group)
return ERR_PTR(-ENOMEM);
- kref_init(&group->kref);
+ refcount_set(&group->users, 1);
INIT_LIST_HEAD(&group->device_list);
mutex_init(&group->device_lock);
INIT_LIST_HEAD(&group->unbound_list);
@@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
return group;
}
-/* called with vfio.group_lock held */
-static void vfio_group_release(struct kref *kref)
+static void vfio_group_put(struct vfio_group *group)
{
- struct vfio_group *group = container_of(kref, struct vfio_group, kref);
+ if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
+ return;
WARN_ON(!list_empty(&group->device_list));
WARN_ON(atomic_read(&group->container_users));
@@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
vfio_group_unlock_and_free(group);
}
-static void vfio_group_put(struct vfio_group *group)
-{
- kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
-}
-
-/* Assume group_lock or group reference is held */
static void vfio_group_get(struct vfio_group *group)
{
- kref_get(&group->kref);
+ refcount_inc(&group->users);
}
static struct vfio_group *vfio_group_get_from_minor(int minor)
@@ -1662,6 +1656,7 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
if (ret)
return ERR_PTR(ret);
+ /* Since the caller holds the fget on the file users must be >= 1 */
vfio_group_get(group);
return group;
--
2.33.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (3 preceding siblings ...)
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
2021-10-12 8:33 ` Tian, Kevin
2021-10-12 8:57 ` Liu, Yi L
4 siblings, 2 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
Modernize how vfio is creating the group char dev and sysfs presence.
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.
This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.
Split the code to make this possible:
- vfio_group_alloc()/vfio_group_release() are pair'd functions to
alloc/free the vfio_group. release is done under the struct device
kref.
- vfio_create_group()/vfio_group_put() are pairs that manage the
sysfs/cdev lifetime. Once the uses count is zero the vfio group's
userspace presence is destroyed.
- The IDR is replaced with an IDA. container_of(inode->i_cdev)
is used to get back to the vfio_group during fops open. The IDA
assigns unique minor numbers.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 200 +++++++++++++++++++++-----------------------
1 file changed, 94 insertions(+), 106 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index dbe7edd88ce35c..01e04947250f40 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -43,9 +43,8 @@ static struct vfio {
struct list_head iommu_drivers_list;
struct mutex iommu_drivers_lock;
struct list_head group_list;
- struct idr group_idr;
- struct mutex group_lock;
- struct cdev group_cdev;
+ struct mutex group_lock; /* locks group_list */
+ struct ida group_ida;
dev_t group_devt;
} vfio;
@@ -69,14 +68,14 @@ struct vfio_unbound_dev {
};
struct vfio_group {
+ struct device dev;
+ struct cdev cdev;
refcount_t users;
- int minor;
atomic_t container_users;
struct iommu_group *iommu_group;
struct vfio_container *container;
struct list_head device_list;
struct mutex device_lock;
- struct device *dev;
struct notifier_block nb;
struct list_head vfio_next;
struct list_head container_next;
@@ -98,6 +97,7 @@ MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode. Thi
#endif
static DEFINE_XARRAY(vfio_device_set_xa);
+static const struct file_operations vfio_group_fops;
int vfio_assign_device_set(struct vfio_device *device, void *set_id)
{
@@ -281,19 +281,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
}
EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
-/**
- * Group minor allocation/free - both called with vfio.group_lock held
- */
-static int vfio_alloc_group_minor(struct vfio_group *group)
-{
- return idr_alloc(&vfio.group_idr, group, 0, MINORMASK + 1, GFP_KERNEL);
-}
-
-static void vfio_free_group_minor(int minor)
-{
- idr_remove(&vfio.group_idr, minor);
-}
-
static int vfio_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data);
static void vfio_group_get(struct vfio_group *group);
@@ -322,26 +309,6 @@ static void vfio_container_put(struct vfio_container *container)
kref_put(&container->kref, vfio_container_release);
}
-static void vfio_group_unlock_and_free(struct vfio_group *group)
-{
- struct vfio_unbound_dev *unbound, *tmp;
-
- mutex_unlock(&vfio.group_lock);
- /*
- * Unregister outside of lock. A spurious callback is harmless now
- * that the group is no longer in vfio.group_list.
- */
- iommu_group_unregister_notifier(group->iommu_group, &group->nb);
-
- list_for_each_entry_safe(unbound, tmp,
- &group->unbound_list, unbound_next) {
- list_del(&unbound->unbound_next);
- kfree(unbound);
- }
- iommu_group_put(group->iommu_group);
- kfree(group);
-}
-
/**
* Group objects - create, release, get, put, search
*/
@@ -370,75 +337,112 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
return group;
}
-static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
- enum vfio_group_type type)
+static void vfio_group_release(struct device *dev)
{
- struct vfio_group *group, *existing_group;
- struct device *dev;
- int ret, minor;
+ struct vfio_group *group = container_of(dev, struct vfio_group, dev);
+ struct vfio_unbound_dev *unbound, *tmp;
+
+ list_for_each_entry_safe(unbound, tmp,
+ &group->unbound_list, unbound_next) {
+ list_del(&unbound->unbound_next);
+ kfree(unbound);
+ }
+
+ mutex_destroy(&group->device_lock);
+ mutex_destroy(&group->unbound_lock);
+ iommu_group_put(group->iommu_group);
+ ida_free(&vfio.group_ida, MINOR(group->dev.devt));
+ kfree(group);
+}
+
+static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
+ enum vfio_group_type type)
+{
+ struct vfio_group *group;
+ int minor;
group = kzalloc(sizeof(*group), GFP_KERNEL);
if (!group)
return ERR_PTR(-ENOMEM);
+ minor = ida_alloc_max(&vfio.group_ida, MINORMASK, GFP_KERNEL);
+ if (minor < 0) {
+ kfree(group);
+ return ERR_PTR(minor);
+ }
+
+ device_initialize(&group->dev);
+ group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor);
+ group->dev.class = vfio.class;
+ group->dev.release = vfio_group_release;
+ cdev_init(&group->cdev, &vfio_group_fops);
+ group->cdev.owner = THIS_MODULE;
+
refcount_set(&group->users, 1);
INIT_LIST_HEAD(&group->device_list);
mutex_init(&group->device_lock);
INIT_LIST_HEAD(&group->unbound_list);
mutex_init(&group->unbound_lock);
- atomic_set(&group->container_users, 0);
- atomic_set(&group->opened, 0);
init_waitqueue_head(&group->container_q);
group->iommu_group = iommu_group;
- /* put in vfio_group_unlock_and_free() */
+ /* put in vfio_group_release() */
iommu_group_ref_get(iommu_group);
group->type = type;
BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
- group->nb.notifier_call = vfio_iommu_group_notifier;
+ return group;
+}
- ret = iommu_group_register_notifier(iommu_group, &group->nb);
- if (ret) {
- group = ERR_PTR(ret);
- goto err_put_group;
+static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
+ enum vfio_group_type type)
+{
+ struct vfio_group *group;
+ struct vfio_group *ret;
+ int err;
+
+ group = vfio_group_alloc(iommu_group, type);
+ if (IS_ERR(group))
+ return group;
+
+ err = dev_set_name(&group->dev, "%s%d",
+ group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
+ iommu_group_id(iommu_group));
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_put;
+ }
+
+ group->nb.notifier_call = vfio_iommu_group_notifier;
+ err = iommu_group_register_notifier(iommu_group, &group->nb);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_put;
}
mutex_lock(&vfio.group_lock);
/* Did we race creating this group? */
- existing_group = __vfio_group_get_from_iommu(iommu_group);
- if (existing_group) {
- vfio_group_unlock_and_free(group);
- return existing_group;
- }
+ ret = __vfio_group_get_from_iommu(iommu_group);
+ if (ret)
+ goto err_unlock;
- minor = vfio_alloc_group_minor(group);
- if (minor < 0) {
- vfio_group_unlock_and_free(group);
- return ERR_PTR(minor);
+ err = cdev_device_add(&group->cdev, &group->dev);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_unlock;
}
- dev = device_create(vfio.class, NULL,
- MKDEV(MAJOR(vfio.group_devt), minor), group, "%s%d",
- group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
- iommu_group_id(iommu_group));
- if (IS_ERR(dev)) {
- vfio_free_group_minor(minor);
- vfio_group_unlock_and_free(group);
- return ERR_CAST(dev);
- }
-
- group->minor = minor;
- group->dev = dev;
-
list_add(&group->vfio_next, &vfio.group_list);
mutex_unlock(&vfio.group_lock);
-
-err_put_group:
- iommu_group_put(iommu_group);
- kfree(group);
return group;
+
+err_unlock:
+ mutex_unlock(&vfio.group_lock);
+ iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+err_put:
+ put_device(&group->dev);
+ return ret;
}
static void vfio_group_put(struct vfio_group *group)
@@ -450,10 +454,12 @@ static void vfio_group_put(struct vfio_group *group)
WARN_ON(atomic_read(&group->container_users));
WARN_ON(group->notifier.head);
- device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
list_del(&group->vfio_next);
- vfio_free_group_minor(group->minor);
- vfio_group_unlock_and_free(group);
+ cdev_device_del(&group->cdev, &group->dev);
+ mutex_unlock(&vfio.group_lock);
+
+ iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+ put_device(&group->dev);
}
static void vfio_group_get(struct vfio_group *group)
@@ -461,20 +467,10 @@ static void vfio_group_get(struct vfio_group *group)
refcount_inc(&group->users);
}
-static struct vfio_group *vfio_group_get_from_minor(int minor)
+/* returns true if the get was obtained */
+static bool vfio_group_try_get(struct vfio_group *group)
{
- struct vfio_group *group;
-
- mutex_lock(&vfio.group_lock);
- group = idr_find(&vfio.group_idr, minor);
- if (!group) {
- mutex_unlock(&vfio.group_lock);
- return NULL;
- }
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
-
- return group;
+ return refcount_inc_not_zero(&group->users);
}
static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
@@ -1484,11 +1480,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
static int vfio_group_fops_open(struct inode *inode, struct file *filep)
{
- struct vfio_group *group;
+ struct vfio_group *group =
+ container_of(inode->i_cdev, struct vfio_group, cdev);
int opened;
- group = vfio_group_get_from_minor(iminor(inode));
- if (!group)
+ if (!vfio_group_try_get(group))
return -ENODEV;
if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
@@ -2296,7 +2292,7 @@ static int __init vfio_init(void)
{
int ret;
- idr_init(&vfio.group_idr);
+ ida_init(&vfio.group_ida);
mutex_init(&vfio.group_lock);
mutex_init(&vfio.iommu_drivers_lock);
INIT_LIST_HEAD(&vfio.group_list);
@@ -2321,11 +2317,6 @@ static int __init vfio_init(void)
if (ret)
goto err_alloc_chrdev;
- cdev_init(&vfio.group_cdev, &vfio_group_fops);
- ret = cdev_add(&vfio.group_cdev, vfio.group_devt, MINORMASK + 1);
- if (ret)
- goto err_cdev_add;
-
pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
#ifdef CONFIG_VFIO_NOIOMMU
@@ -2333,8 +2324,6 @@ static int __init vfio_init(void)
#endif
return 0;
-err_cdev_add:
- unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
err_alloc_chrdev:
class_destroy(vfio.class);
vfio.class = NULL;
@@ -2350,8 +2339,7 @@ static void __exit vfio_cleanup(void)
#ifdef CONFIG_VFIO_NOIOMMU
vfio_unregister_iommu_driver(&vfio_noiommu_ops);
#endif
- idr_destroy(&vfio.group_idr);
- cdev_del(&vfio.group_cdev);
+ ida_destroy(&vfio.group_ida);
unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
class_destroy(vfio.class);
vfio.class = NULL;
--
2.33.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
@ 2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:34 ` Jason Gunthorpe
2021-10-12 6:32 ` Tian, Kevin
2021-10-12 8:51 ` Liu, Yi L
2 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Fri, 1 Oct 2021 20:22:20 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
>
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
>
> Prior to vfio_create_group() unlocking and returning we have
> container_users == 0
> device_list == empty
> And this cannot change until the mutex is unlocked.
>
> After the kref goes to zero we must also have
> container_users == 0
> device_list == empty
>
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
>
> These two facts are important because when checking each operation we see:
>
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> 0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> 0 container_users ends the call
>
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
>
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
>
> This allows deleting all of the deferred group put code.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 89 +++++----------------------------------------
> 1 file changed, 9 insertions(+), 80 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
>
> static void vfio_group_unlock_and_free(struct vfio_group *group)
> {
> + struct vfio_unbound_dev *unbound, *tmp;
> +
> mutex_unlock(&vfio.group_lock);
> /*
> * Unregister outside of lock. A spurious callback is harmless now
> * that the group is no longer in vfio.group_list.
> */
This comment is indirectly referencing the vfio_group_try_get() in the
notifier callback, but as you describe in the commit log, it's actually
the container_users value that prevents this from racing group release
now. Otherwise, tricky but looks good. Thanks,
Alex
> iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +
> + list_for_each_entry_safe(unbound, tmp,
> + &group->unbound_list, unbound_next) {
> + list_del(&unbound->unbound_next);
> + kfree(unbound);
> + }
> kfree(group);
> }
>
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>
> group->nb.notifier_call = vfio_iommu_group_notifier;
>
> - /*
> - * blocking notifiers acquire a rwsem around registering and hold
> - * it around callback. Therefore, need to register outside of
> - * vfio.group_lock to avoid A-B/B-A contention. Our callback won't
> - * do anything unless it can find the group in vfio.group_list, so
> - * no harm in registering early.
> - */
> ret = iommu_group_register_notifier(iommu_group, &group->nb);
> if (ret) {
> kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> static void vfio_group_release(struct kref *kref)
> {
> struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> - struct vfio_unbound_dev *unbound, *tmp;
> struct iommu_group *iommu_group = group->iommu_group;
>
> WARN_ON(!list_empty(&group->device_list));
> + WARN_ON(atomic_read(&group->container_users));
> WARN_ON(group->notifier.head);
>
> - list_for_each_entry_safe(unbound, tmp,
> - &group->unbound_list, unbound_next) {
> - list_del(&unbound->unbound_next);
> - kfree(unbound);
> - }
> -
> device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
> list_del(&group->vfio_next);
> vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
> kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> }
>
> -struct vfio_group_put_work {
> - struct work_struct work;
> - struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = container_of(work, struct vfio_group_put_work, work);
> -
> - vfio_group_put(do_work->group);
> - kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> - if (WARN_ON(!do_work))
> - return;
> -
> - INIT_WORK(&do_work->work, vfio_group_put_bg);
> - do_work->group = group;
> - schedule_work(&do_work->work);
> -}
> -
> /* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> kref_get(&group->kref);
> }
>
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> - struct vfio_group *target = group;
> -
> - mutex_lock(&vfio.group_lock);
> - list_for_each_entry(group, &vfio.group_list, vfio_next) {
> - if (group == target) {
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> - return group;
> - }
> - }
> - mutex_unlock(&vfio.group_lock);
> -
> - return NULL;
> -}
> -
> static
> struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
> struct device *dev = data;
> struct vfio_unbound_dev *unbound;
>
> - /*
> - * Need to go through a group_lock lookup to get a reference or we
> - * risk racing a group being removed. Ignore spurious notifies.
> - */
> - group = vfio_group_try_get(group);
> - if (!group)
> - return NOTIFY_OK;
> -
> switch (action) {
> case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
> vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
> mutex_unlock(&group->unbound_lock);
> break;
> }
> -
> - /*
> - * If we're the last reference to the group, the group will be
> - * released, which includes unregistering the iommu group notifier.
> - * We hold a read-lock on that notifier list, unregistering needs
> - * a write-lock... deadlock. Release our reference asynchronously
> - * to avoid that situation.
> - */
> - vfio_group_schedule_put(group);
> return NOTIFY_OK;
> }
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
@ 2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:36 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Fri, 1 Oct 2021 20:22:22 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> If vfio_create_group() searches the group list and returns an already
> existing group it does not put back the iommu_group reference that the
> caller passed in.
>
> Change the semantic of vfio_create_group() to not move the reference in
> from the caller, but instead obtain a new reference inside and leave the
> caller's reference alone. The two callers must now call iommu_group_put().
>
> This is an unlikely race as the only caller that could hit it has already
> searched the group list before attempting to create the group.
>
> Fixes: cba3345cc494 ("vfio: VFIO core")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 1cb12033b02240..bf233943dc992f 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
> list_del(&unbound->unbound_next);
> kfree(unbound);
> }
> + iommu_group_put(group->iommu_group);
> kfree(group);
> }
>
> @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> atomic_set(&group->opened, 0);
> init_waitqueue_head(&group->container_q);
> group->iommu_group = iommu_group;
> + /* put in vfio_group_unlock_and_free() */
> + iommu_group_ref_get(iommu_group);
> group->type = type;
> BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
>
> @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>
> ret = iommu_group_register_notifier(iommu_group, &group->nb);
> if (ret) {
> - kfree(group);
> - return ERR_PTR(ret);
> + group = ERR_PTR(ret);
> + goto err_put_group;
> }
>
> mutex_lock(&vfio.group_lock);
> @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>
> mutex_unlock(&vfio.group_lock);
>
> +err_put_group:
> + iommu_group_put(iommu_group);
> + kfree(group);
????
In the non-error path we're releasing the caller's reference which is
now their responsibility to release, but in any case we're freeing the
object that we return? That can't be right.
> return group;
> }
>
> @@ -439,7 +445,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> static void vfio_group_release(struct kref *kref)
> {
> struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> - struct iommu_group *iommu_group = group->iommu_group;
>
> WARN_ON(!list_empty(&group->device_list));
> WARN_ON(atomic_read(&group->container_users));
> @@ -449,7 +454,6 @@ static void vfio_group_release(struct kref *kref)
> list_del(&group->vfio_next);
> vfio_free_group_minor(group->minor);
> vfio_group_unlock_and_free(group);
> - iommu_group_put(iommu_group);
> }
>
> static void vfio_group_put(struct vfio_group *group)
> @@ -734,7 +738,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
> ret = PTR_ERR(group);
> goto out_remove_device;
> }
> -
> + iommu_group_put(iommu_group);
> return group;
>
> out_remove_device:
> @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>
> /* a newly created vfio_group keeps the reference. */
This comment is now incorrect. Thanks,
Alex
> group = vfio_create_group(iommu_group, VFIO_IOMMU);
> - if (IS_ERR(group))
> - goto out_put;
> - return group;
> -
> out_put:
> iommu_group_put(iommu_group);
> return group;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
@ 2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:39 ` Jason Gunthorpe
2021-10-12 7:08 ` Tian, Kevin
2021-10-12 9:04 ` Liu, Yi L
2 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Fri, 1 Oct 2021 20:22:23 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
>
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
> };
>
> struct vfio_group {
> - struct kref kref;
> + refcount_t users;
Follow indenting for existing structs please. The next patch even
mixes following and changing formatting, so I'm not sure what rule is
being used here. Thanks,
Alex
> int minor;
> atomic_t container_users;
> struct iommu_group *iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> if (!group)
> return ERR_PTR(-ENOMEM);
>
> - kref_init(&group->kref);
> + refcount_set(&group->users, 1);
> INIT_LIST_HEAD(&group->device_list);
> mutex_init(&group->device_lock);
> INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> return group;
> }
>
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
> {
> - struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> + if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> + return;
>
> WARN_ON(!list_empty(&group->device_list));
> WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
> vfio_group_unlock_and_free(group);
> }
>
> -static void vfio_group_put(struct vfio_group *group)
> -{
> - kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> - kref_get(&group->kref);
> + refcount_inc(&group->users);
> }
>
> static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
> if (ret)
> return ERR_PTR(ret);
>
> + /* Since the caller holds the fget on the file users must be >= 1 */
> vfio_group_get(group);
>
> return group;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-04 22:25 ` Alex Williamson
@ 2021-10-04 22:34 ` Jason Gunthorpe
2021-10-05 4:01 ` Alex Williamson
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:34 UTC (permalink / raw)
To: Alex Williamson
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, Oct 04, 2021 at 04:25:32PM -0600, Alex Williamson wrote:
> On Fri, 1 Oct 2021 20:22:20 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> > built using a blocking_notifier_chain which integrates a rwsem. The
> > notifier function cannot be running outside its registration.
> >
> > When considering how the notifier function interacts with create/destroy
> > of the group there are two fringe cases, the notifier starts before
> > list_add(&vfio.group_list) and the notifier runs after the kref
> > becomes 0.
> >
> > Prior to vfio_create_group() unlocking and returning we have
> > container_users == 0
> > device_list == empty
> > And this cannot change until the mutex is unlocked.
> >
> > After the kref goes to zero we must also have
> > container_users == 0
> > device_list == empty
> >
> > Both are required because they are balanced operations and a 0 kref means
> > some caller became unbalanced. Add the missing assertion that
> > container_users must be zero as well.
> >
> > These two facts are important because when checking each operation we see:
> >
> > - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> > Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> > 0 container_users ends the call
> > - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> > 0 container_users ends the call
> >
> > Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> > items from the unbound list. During creation this list is empty, during
> > kref == 0 nothing can read this list, and it will be freed soon.
> >
> > Since the vfio_group_release() doesn't hold the appropriate lock to
> > manipulate the unbound_list and could race with the notifier, move the
> > cleanup to directly before the kfree.
> >
> > This allows deleting all of the deferred group put code.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > drivers/vfio/vfio.c | 89 +++++----------------------------------------
> > 1 file changed, 9 insertions(+), 80 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 08b27b64f0f935..32a53cb3598524 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
> >
> > static void vfio_group_unlock_and_free(struct vfio_group *group)
> > {
> > + struct vfio_unbound_dev *unbound, *tmp;
> > +
> > mutex_unlock(&vfio.group_lock);
> > /*
> > * Unregister outside of lock. A spurious callback is harmless now
> > * that the group is no longer in vfio.group_list.
> > */
>
> This comment is indirectly referencing the vfio_group_try_get() in the
> notifier callback, but as you describe in the commit log, it's actually
> the container_users value that prevents this from racing group release
> now. Otherwise, tricky but looks good. Thanks,
Do you think the comment should be deleted in this commit? I think I
got it later on..
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-04 22:25 ` Alex Williamson
@ 2021-10-04 22:36 ` Jason Gunthorpe
2021-10-05 4:01 ` Alex Williamson
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:36 UTC (permalink / raw)
To: Alex Williamson
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, Oct 04, 2021 at 04:25:43PM -0600, Alex Williamson wrote:
> On Fri, 1 Oct 2021 20:22:22 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > If vfio_create_group() searches the group list and returns an already
> > existing group it does not put back the iommu_group reference that the
> > caller passed in.
> >
> > Change the semantic of vfio_create_group() to not move the reference in
> > from the caller, but instead obtain a new reference inside and leave the
> > caller's reference alone. The two callers must now call iommu_group_put().
> >
> > This is an unlikely race as the only caller that could hit it has already
> > searched the group list before attempting to create the group.
> >
> > Fixes: cba3345cc494 ("vfio: VFIO core")
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > drivers/vfio/vfio.c | 18 +++++++++---------
> > 1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 1cb12033b02240..bf233943dc992f 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
> > list_del(&unbound->unbound_next);
> > kfree(unbound);
> > }
> > + iommu_group_put(group->iommu_group);
> > kfree(group);
> > }
> >
> > @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > atomic_set(&group->opened, 0);
> > init_waitqueue_head(&group->container_q);
> > group->iommu_group = iommu_group;
> > + /* put in vfio_group_unlock_and_free() */
> > + iommu_group_ref_get(iommu_group);
^^^^^^^^^^^^^^^^^
> > group->type = type;
> > BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> >
> > @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> >
> > ret = iommu_group_register_notifier(iommu_group, &group->nb);
> > if (ret) {
> > - kfree(group);
> > - return ERR_PTR(ret);
> > + group = ERR_PTR(ret);
> > + goto err_put_group;
> > }
> >
> > mutex_lock(&vfio.group_lock);
> > @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> >
> > mutex_unlock(&vfio.group_lock);
> >
> > +err_put_group:
> > + iommu_group_put(iommu_group);
> > + kfree(group);
>
> ????
>
> In the non-error path we're releasing the caller's reference which is
> now their responsibility to release,
This release is paried with the get in the same function added one
hunk above
> but in any case we're freeing the object that we return? That
> can't be right.
Yes, that is a rebasing mistake pulling this back from the last patch
that had a "return ret" here, thanks
> > @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> >
> > /* a newly created vfio_group keeps the reference. */
>
> This comment is now incorrect. Thanks,
Indeed
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-04 22:25 ` Alex Williamson
@ 2021-10-04 22:39 ` Jason Gunthorpe
0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:39 UTC (permalink / raw)
To: Alex Williamson
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, Oct 04, 2021 at 04:25:51PM -0600, Alex Williamson wrote:
> On Fri, 1 Oct 2021 20:22:23 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > The next patch adds a struct device to the struct vfio_group, and it is
> > confusing/bad practice to have two krefs in the same struct. This kref is
> > controlling the period when the vfio_group is registered in sysfs, and
> > visible in the internal lookup. Switch it to a refcount_t instead.
> >
> > The refcount_dec_and_mutex_lock() is still required because we need
> > atomicity of the list searches and sysfs presence.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > drivers/vfio/vfio.c | 19 +++++++------------
> > 1 file changed, 7 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index bf233943dc992f..dbe7edd88ce35c 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
> > };
> >
> > struct vfio_group {
> > - struct kref kref;
> > + refcount_t users;
>
> Follow indenting for existing structs please. The next patch even
> mixes following and changing formatting, so I'm not sure what rule is
> being used here. Thanks,
Sure, I generally err toward nixing the vertical alignment. It
generally is a net harm to backporting and I don't care much for the
poor readability of 30 chars of whitespace between related
words.. I look over the patches again and check
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-04 22:34 ` Jason Gunthorpe
@ 2021-10-05 4:01 ` Alex Williamson
2021-10-05 16:17 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2021-10-05 4:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, 4 Oct 2021 19:34:31 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Mon, Oct 04, 2021 at 04:25:32PM -0600, Alex Williamson wrote:
> > On Fri, 1 Oct 2021 20:22:20 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> > > built using a blocking_notifier_chain which integrates a rwsem. The
> > > notifier function cannot be running outside its registration.
> > >
> > > When considering how the notifier function interacts with create/destroy
> > > of the group there are two fringe cases, the notifier starts before
> > > list_add(&vfio.group_list) and the notifier runs after the kref
> > > becomes 0.
> > >
> > > Prior to vfio_create_group() unlocking and returning we have
> > > container_users == 0
> > > device_list == empty
> > > And this cannot change until the mutex is unlocked.
> > >
> > > After the kref goes to zero we must also have
> > > container_users == 0
> > > device_list == empty
> > >
> > > Both are required because they are balanced operations and a 0 kref means
> > > some caller became unbalanced. Add the missing assertion that
> > > container_users must be zero as well.
> > >
> > > These two facts are important because when checking each operation we see:
> > >
> > > - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> > > Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> > > 0 container_users ends the call
> > > - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> > > 0 container_users ends the call
> > >
> > > Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> > > items from the unbound list. During creation this list is empty, during
> > > kref == 0 nothing can read this list, and it will be freed soon.
> > >
> > > Since the vfio_group_release() doesn't hold the appropriate lock to
> > > manipulate the unbound_list and could race with the notifier, move the
> > > cleanup to directly before the kfree.
> > >
> > > This allows deleting all of the deferred group put code.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > drivers/vfio/vfio.c | 89 +++++----------------------------------------
> > > 1 file changed, 9 insertions(+), 80 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 08b27b64f0f935..32a53cb3598524 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
> > >
> > > static void vfio_group_unlock_and_free(struct vfio_group *group)
> > > {
> > > + struct vfio_unbound_dev *unbound, *tmp;
> > > +
> > > mutex_unlock(&vfio.group_lock);
> > > /*
> > > * Unregister outside of lock. A spurious callback is harmless now
> > > * that the group is no longer in vfio.group_list.
> > > */
> >
> > This comment is indirectly referencing the vfio_group_try_get() in the
> > notifier callback, but as you describe in the commit log, it's actually
> > the container_users value that prevents this from racing group release
> > now. Otherwise, tricky but looks good. Thanks,
>
> Do you think the comment should be deleted in this commit? I think I
> got it later on..
I think the commit log argument is that notifies racing the group
release are harmless so long as the container is unused, and releasing
a group with active container users would be unbalanced, which
justifies the WARN_ON added here. This comment does get removed in the
final patch, but maybe that logic can be noted in a comment by the new
sanity test. Thanks,
Alex
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-04 22:36 ` Jason Gunthorpe
@ 2021-10-05 4:01 ` Alex Williamson
2021-10-05 14:45 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2021-10-05 4:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, 4 Oct 2021 19:36:41 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Mon, Oct 04, 2021 at 04:25:43PM -0600, Alex Williamson wrote:
> > On Fri, 1 Oct 2021 20:22:22 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > If vfio_create_group() searches the group list and returns an already
> > > existing group it does not put back the iommu_group reference that the
> > > caller passed in.
> > >
> > > Change the semantic of vfio_create_group() to not move the reference in
> > > from the caller, but instead obtain a new reference inside and leave the
> > > caller's reference alone. The two callers must now call iommu_group_put().
> > >
> > > This is an unlikely race as the only caller that could hit it has already
> > > searched the group list before attempting to create the group.
> > >
> > > Fixes: cba3345cc494 ("vfio: VFIO core")
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > drivers/vfio/vfio.c | 18 +++++++++---------
> > > 1 file changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 1cb12033b02240..bf233943dc992f 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
> > > list_del(&unbound->unbound_next);
> > > kfree(unbound);
> > > }
> > > + iommu_group_put(group->iommu_group);
> > > kfree(group);
> > > }
> > >
> > > @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > > atomic_set(&group->opened, 0);
> > > init_waitqueue_head(&group->container_q);
> > > group->iommu_group = iommu_group;
> > > + /* put in vfio_group_unlock_and_free() */
> > > + iommu_group_ref_get(iommu_group);
>
> ^^^^^^^^^^^^^^^^^
>
> > > group->type = type;
> > > BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> > >
> > > @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > >
> > > ret = iommu_group_register_notifier(iommu_group, &group->nb);
> > > if (ret) {
> > > - kfree(group);
> > > - return ERR_PTR(ret);
> > > + group = ERR_PTR(ret);
> > > + goto err_put_group;
> > > }
> > >
> > > mutex_lock(&vfio.group_lock);
> > > @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > >
> > > mutex_unlock(&vfio.group_lock);
> > >
> > > +err_put_group:
> > > + iommu_group_put(iommu_group);
> > > + kfree(group);
> >
> > ????
> >
> > In the non-error path we're releasing the caller's reference which is
> > now their responsibility to release,
>
> This release is paried with the get in the same function added one
> hunk above
Note that this is the common exit path until the last patch in the
series pulls returning the successfully created/found group above the
error condition exit paths. As it stands, this patch unconditionally
releases the reference it claims to newly create. Thanks,
Alex
> > but in any case we're freeing the object that we return? That
> > can't be right.
>
> Yes, that is a rebasing mistake pulling this back from the last patch
> that had a "return ret" here, thanks
>
> > > @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> > >
> > > /* a newly created vfio_group keeps the reference. */
> >
> > This comment is now incorrect. Thanks,
>
> Indeed
>
> Jason
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-05 4:01 ` Alex Williamson
@ 2021-10-05 14:45 ` Jason Gunthorpe
0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-05 14:45 UTC (permalink / raw)
To: Alex Williamson
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, Oct 04, 2021 at 10:01:54PM -0600, Alex Williamson wrote:
> Note that this is the common exit path until the last patch in the
> series pulls returning the successfully created/found group above the
> error condition exit paths. As it stands, this patch unconditionally
> releases the reference it claims to newly create. Thanks,
Oh, I see, you are pointing at the missing return.. Yes I got that too
when I fixed the rest of it
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-05 4:01 ` Alex Williamson
@ 2021-10-05 16:17 ` Jason Gunthorpe
0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-05 16:17 UTC (permalink / raw)
To: Alex Williamson
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Mon, Oct 04, 2021 at 10:01:52PM -0600, Alex Williamson wrote:
> I think the commit log argument is that notifies racing the group
> release are harmless so long as the container is unused, and releasing
> a group with active container users would be unbalanced, which
> justifies the WARN_ON added here.
Yes
I changed it like this:
@@ -327,6 +327,10 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
struct vfio_unbound_dev *unbound, *tmp;
mutex_unlock(&vfio.group_lock);
+ /*
+ * Unregister outside of lock. A spurious callback is harmless now
+ * that the group is no longer in vfio.group_list.
+ */
iommu_group_unregister_notifier(group->iommu_group, &group->nb);
list_for_each_entry_safe(unbound, tmp,
@@ -413,12 +418,6 @@ static void vfio_group_release(struct kref *kref)
struct vfio_group *group = container_of(kref, struct vfio_group, kref);
struct iommu_group *iommu_group = group->iommu_group;
- /*
- * These data structures all have paired operations that can only be
- * undone when the caller holds a live reference on the group. Since all
- * pairs must be undone these WARN_ON's indicate some caller did not
- * properly hold the group reference.
- */
WARN_ON(!list_empty(&group->device_list));
WARN_ON(atomic_read(&group->container_users));
WARN_ON(group->notifier.head);
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
@ 2021-10-12 6:32 ` Tian, Kevin
2021-10-12 8:51 ` Liu, Yi L
2 siblings, 0 replies; 28+ messages in thread
From: Tian, Kevin @ 2021-10-12 6:32 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Liu, Yi L, Lu, Baolu
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
>
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
>
> Prior to vfio_create_group() unlocking and returning we have
> container_users == 0
> device_list == empty
> And this cannot change until the mutex is unlocked.
>
> After the kref goes to zero we must also have
> container_users == 0
> device_list == empty
>
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
>
> These two facts are important because when checking each operation we see:
>
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> 0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> 0 container_users ends the call
>
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only
> deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
>
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
>
> This allows deleting all of the deferred group put code.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
btw once we get dma ownership check in iommu core, the entire
iommu group notifier and unbound list will be removed. Then this
deferred code will be unnecessary even w/o this tricky but precise
analysis. 😊
> ---
> drivers/vfio/vfio.c | 89 +++++----------------------------------------
> 1 file changed, 9 insertions(+), 80 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct
> vfio_container *container)
>
> static void vfio_group_unlock_and_free(struct vfio_group *group)
> {
> + struct vfio_unbound_dev *unbound, *tmp;
> +
> mutex_unlock(&vfio.group_lock);
> /*
> * Unregister outside of lock. A spurious callback is harmless now
> * that the group is no longer in vfio.group_list.
> */
> iommu_group_unregister_notifier(group->iommu_group, &group-
> >nb);
> +
> + list_for_each_entry_safe(unbound, tmp,
> + &group->unbound_list, unbound_next) {
> + list_del(&unbound->unbound_next);
> + kfree(unbound);
> + }
> kfree(group);
> }
>
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>
> group->nb.notifier_call = vfio_iommu_group_notifier;
>
> - /*
> - * blocking notifiers acquire a rwsem around registering and hold
> - * it around callback. Therefore, need to register outside of
> - * vfio.group_lock to avoid A-B/B-A contention. Our callback won't
> - * do anything unless it can find the group in vfio.group_list, so
> - * no harm in registering early.
> - */
> ret = iommu_group_register_notifier(iommu_group, &group->nb);
> if (ret) {
> kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> static void vfio_group_release(struct kref *kref)
> {
> struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> - struct vfio_unbound_dev *unbound, *tmp;
> struct iommu_group *iommu_group = group->iommu_group;
>
> WARN_ON(!list_empty(&group->device_list));
> + WARN_ON(atomic_read(&group->container_users));
> WARN_ON(group->notifier.head);
>
> - list_for_each_entry_safe(unbound, tmp,
> - &group->unbound_list, unbound_next) {
> - list_del(&unbound->unbound_next);
> - kfree(unbound);
> - }
> -
> device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
> list_del(&group->vfio_next);
> vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group
> *group)
> kref_put_mutex(&group->kref, vfio_group_release,
> &vfio.group_lock);
> }
>
> -struct vfio_group_put_work {
> - struct work_struct work;
> - struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = container_of(work, struct vfio_group_put_work, work);
> -
> - vfio_group_put(do_work->group);
> - kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> - if (WARN_ON(!do_work))
> - return;
> -
> - INIT_WORK(&do_work->work, vfio_group_put_bg);
> - do_work->group = group;
> - schedule_work(&do_work->work);
> -}
> -
> /* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> kref_get(&group->kref);
> }
>
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> - struct vfio_group *target = group;
> -
> - mutex_lock(&vfio.group_lock);
> - list_for_each_entry(group, &vfio.group_list, vfio_next) {
> - if (group == target) {
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> - return group;
> - }
> - }
> - mutex_unlock(&vfio.group_lock);
> -
> - return NULL;
> -}
> -
> static
> struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> struct device *dev = data;
> struct vfio_unbound_dev *unbound;
>
> - /*
> - * Need to go through a group_lock lookup to get a reference or we
> - * risk racing a group being removed. Ignore spurious notifies.
> - */
> - group = vfio_group_try_get(group);
> - if (!group)
> - return NOTIFY_OK;
> -
> switch (action) {
> case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
> vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> mutex_unlock(&group->unbound_lock);
> break;
> }
> -
> - /*
> - * If we're the last reference to the group, the group will be
> - * released, which includes unregistering the iommu group notifier.
> - * We hold a read-lock on that notifier list, unregistering needs
> - * a write-lock... deadlock. Release our reference asynchronously
> - * to avoid that situation.
> - */
> - vfio_group_schedule_put(group);
> return NOTIFY_OK;
> }
>
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
@ 2021-10-12 6:37 ` Tian, Kevin
2021-10-12 8:52 ` Liu, Yi L
1 sibling, 0 replies; 28+ messages in thread
From: Tian, Kevin @ 2021-10-12 6:37 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu()
> so
> that vfio_create_group() can call it to consolidate this duplicated code.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
> 1 file changed, 30 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 32a53cb3598524..1cb12033b02240 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct
> vfio_group *group)
> /**
> * Group objects - create, release, get, put, search
> */
> +static struct vfio_group *
> +__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> + struct vfio_group *group;
> +
> + list_for_each_entry(group, &vfio.group_list, vfio_next) {
> + if (group->iommu_group == iommu_group) {
> + vfio_group_get(group);
> + return group;
> + }
> + }
> + return NULL;
> +}
> +
> +static struct vfio_group *
> +vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> + struct vfio_group *group;
> +
> + mutex_lock(&vfio.group_lock);
> + group = __vfio_group_get_from_iommu(iommu_group);
> + mutex_unlock(&vfio.group_lock);
> + return group;
> +}
> +
> static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> enum vfio_group_type type)
> {
> - struct vfio_group *group, *tmp;
> + struct vfio_group *group, *existing_group;
> struct device *dev;
> int ret, minor;
>
> @@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> mutex_lock(&vfio.group_lock);
>
> /* Did we race creating this group? */
> - list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
> - if (tmp->iommu_group == iommu_group) {
> - vfio_group_get(tmp);
> - vfio_group_unlock_and_free(group);
> - return tmp;
> - }
> + existing_group = __vfio_group_get_from_iommu(iommu_group);
> + if (existing_group) {
> + vfio_group_unlock_and_free(group);
> + return existing_group;
> }
>
> minor = vfio_alloc_group_minor(group);
> @@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
> kref_get(&group->kref);
> }
>
> -static
> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> -{
> - struct vfio_group *group;
> -
> - mutex_lock(&vfio.group_lock);
> - list_for_each_entry(group, &vfio.group_list, vfio_next) {
> - if (group->iommu_group == iommu_group) {
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> - return group;
> - }
> - }
> - mutex_unlock(&vfio.group_lock);
> -
> - return NULL;
> -}
> -
> static struct vfio_group *vfio_group_get_from_minor(int minor)
> {
> struct vfio_group *group;
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
@ 2021-10-12 7:08 ` Tian, Kevin
2021-10-12 9:04 ` Liu, Yi L
2 siblings, 0 replies; 28+ messages in thread
From: Tian, Kevin @ 2021-10-12 7:08 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
>
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> drivers/vfio/vfio.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
> };
>
> struct vfio_group {
> - struct kref kref;
> + refcount_t users;
> int minor;
> atomic_t container_users;
> struct iommu_group *iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> if (!group)
> return ERR_PTR(-ENOMEM);
>
> - kref_init(&group->kref);
> + refcount_set(&group->users, 1);
> INIT_LIST_HEAD(&group->device_list);
> mutex_init(&group->device_lock);
> INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> return group;
> }
>
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
> {
> - struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> + if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> + return;
>
> WARN_ON(!list_empty(&group->device_list));
> WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
> vfio_group_unlock_and_free(group);
> }
>
> -static void vfio_group_put(struct vfio_group *group)
> -{
> - kref_put_mutex(&group->kref, vfio_group_release,
> &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> - kref_get(&group->kref);
> + refcount_inc(&group->users);
> }
>
> static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group
> *vfio_group_get_external_user(struct file *filep)
> if (ret)
> return ERR_PTR(ret);
>
> + /* Since the caller holds the fget on the file users must be >= 1 */
> vfio_group_get(group);
>
> return group;
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
@ 2021-10-12 8:33 ` Tian, Kevin
2021-10-12 12:05 ` Jason Gunthorpe
2021-10-12 8:57 ` Liu, Yi L
1 sibling, 1 reply; 28+ messages in thread
From: Tian, Kevin @ 2021-10-12 8:33 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
[...]
> +static void vfio_group_release(struct device *dev)
> {
> - struct vfio_group *group, *existing_group;
> - struct device *dev;
> - int ret, minor;
> + struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> + struct vfio_unbound_dev *unbound, *tmp;
> +
> + list_for_each_entry_safe(unbound, tmp,
> + &group->unbound_list, unbound_next) {
> + list_del(&unbound->unbound_next);
> + kfree(unbound);
> + }
move to vfio_group_put()? this is not paired with vfio_group_alloc()...
[...]
> static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> {
> - struct vfio_group *group;
> + struct vfio_group *group =
> + container_of(inode->i_cdev, struct vfio_group, cdev);
> int opened;
A curiosity question. According to cdev_device_del() any cdev already
open will remain with their fops callable. What prevents vfio_group
from being released after cdev_device_del() has been called? Is it
because cdev open will hold a reference to device thus put_device()
will not hit zero in vfio_group_put()?
Thanks
Kevin
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-12 6:32 ` Tian, Kevin
@ 2021-10-12 8:51 ` Liu, Yi L
2 siblings, 0 replies; 28+ messages in thread
From: Liu, Yi L @ 2021-10-12 8:51 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
>
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
>
> Prior to vfio_create_group() unlocking and returning we have
> container_users == 0
> device_list == empty
> And this cannot change until the mutex is unlocked.
>
> After the kref goes to zero we must also have
> container_users == 0
> device_list == empty
>
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
>
> These two facts are important because when checking each operation we see:
>
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> 0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> 0 container_users ends the call
>
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only
> deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
>
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
>
> This allows deleting all of the deferred group put code.
Looks good to me.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Regards,
Yi Liu
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 89 +++++----------------------------------------
> 1 file changed, 9 insertions(+), 80 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container
> *container)
>
> static void vfio_group_unlock_and_free(struct vfio_group *group)
> {
> + struct vfio_unbound_dev *unbound, *tmp;
> +
> mutex_unlock(&vfio.group_lock);
> /*
> * Unregister outside of lock. A spurious callback is harmless now
> * that the group is no longer in vfio.group_list.
> */
> iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +
> + list_for_each_entry_safe(unbound, tmp,
> + &group->unbound_list, unbound_next) {
> + list_del(&unbound->unbound_next);
> + kfree(unbound);
> + }
> kfree(group);
> }
>
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>
> group->nb.notifier_call = vfio_iommu_group_notifier;
>
> - /*
> - * blocking notifiers acquire a rwsem around registering and hold
> - * it around callback. Therefore, need to register outside of
> - * vfio.group_lock to avoid A-B/B-A contention. Our callback won't
> - * do anything unless it can find the group in vfio.group_list, so
> - * no harm in registering early.
> - */
> ret = iommu_group_register_notifier(iommu_group, &group->nb);
> if (ret) {
> kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> static void vfio_group_release(struct kref *kref)
> {
> struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> - struct vfio_unbound_dev *unbound, *tmp;
> struct iommu_group *iommu_group = group->iommu_group;
>
> WARN_ON(!list_empty(&group->device_list));
> + WARN_ON(atomic_read(&group->container_users));
> WARN_ON(group->notifier.head);
>
> - list_for_each_entry_safe(unbound, tmp,
> - &group->unbound_list, unbound_next) {
> - list_del(&unbound->unbound_next);
> - kfree(unbound);
> - }
> -
> device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
> list_del(&group->vfio_next);
> vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
> kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> }
>
> -struct vfio_group_put_work {
> - struct work_struct work;
> - struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = container_of(work, struct vfio_group_put_work, work);
> -
> - vfio_group_put(do_work->group);
> - kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> - struct vfio_group_put_work *do_work;
> -
> - do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> - if (WARN_ON(!do_work))
> - return;
> -
> - INIT_WORK(&do_work->work, vfio_group_put_bg);
> - do_work->group = group;
> - schedule_work(&do_work->work);
> -}
> -
> /* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> kref_get(&group->kref);
> }
>
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> - struct vfio_group *target = group;
> -
> - mutex_lock(&vfio.group_lock);
> - list_for_each_entry(group, &vfio.group_list, vfio_next) {
> - if (group == target) {
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> - return group;
> - }
> - }
> - mutex_unlock(&vfio.group_lock);
> -
> - return NULL;
> -}
> -
> static
> struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> struct device *dev = data;
> struct vfio_unbound_dev *unbound;
>
> - /*
> - * Need to go through a group_lock lookup to get a reference or we
> - * risk racing a group being removed. Ignore spurious notifies.
> - */
> - group = vfio_group_try_get(group);
> - if (!group)
> - return NOTIFY_OK;
> -
> switch (action) {
> case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
> vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> mutex_unlock(&group->unbound_lock);
> break;
> }
> -
> - /*
> - * If we're the last reference to the group, the group will be
> - * released, which includes unregistering the iommu group notifier.
> - * We hold a read-lock on that notifier list, unregistering needs
> - * a write-lock... deadlock. Release our reference asynchronously
> - * to avoid that situation.
> - */
> - vfio_group_schedule_put(group);
> return NOTIFY_OK;
> }
>
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
2021-10-12 6:37 ` Tian, Kevin
@ 2021-10-12 8:52 ` Liu, Yi L
1 sibling, 0 replies; 28+ messages in thread
From: Liu, Yi L @ 2021-10-12 8:52 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
> that vfio_create_group() can call it to consolidate this duplicated code.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Regards,
Yi Liu
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
> 1 file changed, 30 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 32a53cb3598524..1cb12033b02240 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct
> vfio_group *group)
> /**
> * Group objects - create, release, get, put, search
> */
> +static struct vfio_group *
> +__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> + struct vfio_group *group;
> +
> + list_for_each_entry(group, &vfio.group_list, vfio_next) {
> + if (group->iommu_group == iommu_group) {
> + vfio_group_get(group);
> + return group;
> + }
> + }
> + return NULL;
> +}
> +
> +static struct vfio_group *
> +vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> + struct vfio_group *group;
> +
> + mutex_lock(&vfio.group_lock);
> + group = __vfio_group_get_from_iommu(iommu_group);
> + mutex_unlock(&vfio.group_lock);
> + return group;
> +}
> +
> static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> enum vfio_group_type type)
> {
> - struct vfio_group *group, *tmp;
> + struct vfio_group *group, *existing_group;
> struct device *dev;
> int ret, minor;
>
> @@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> mutex_lock(&vfio.group_lock);
>
> /* Did we race creating this group? */
> - list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
> - if (tmp->iommu_group == iommu_group) {
> - vfio_group_get(tmp);
> - vfio_group_unlock_and_free(group);
> - return tmp;
> - }
> + existing_group = __vfio_group_get_from_iommu(iommu_group);
> + if (existing_group) {
> + vfio_group_unlock_and_free(group);
> + return existing_group;
> }
>
> minor = vfio_alloc_group_minor(group);
> @@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
> kref_get(&group->kref);
> }
>
> -static
> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> -{
> - struct vfio_group *group;
> -
> - mutex_lock(&vfio.group_lock);
> - list_for_each_entry(group, &vfio.group_list, vfio_next) {
> - if (group->iommu_group == iommu_group) {
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> - return group;
> - }
> - }
> - mutex_unlock(&vfio.group_lock);
> -
> - return NULL;
> -}
> -
> static struct vfio_group *vfio_group_get_from_minor(int minor)
> {
> struct vfio_group *group;
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
2021-10-12 8:33 ` Tian, Kevin
@ 2021-10-12 8:57 ` Liu, Yi L
2021-10-13 12:49 ` Jason Gunthorpe
1 sibling, 1 reply; 28+ messages in thread
From: Liu, Yi L @ 2021-10-12 8:57 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> Modernize how vfio is creating the group char dev and sysfs presence.
>
> These days drivers with state should use cdev_device_add() and
> cdev_device_del() to manage the cdev and sysfs lifetime.
>
> This API requires the driver to put the struct device and struct cdev
> inside its state struct (vfio_group), and then use the usual
> device_initialize()/cdev_device_add()/cdev_device_del() sequence.
>
> Split the code to make this possible:
>
> - vfio_group_alloc()/vfio_group_release() are pair'd functions to
> alloc/free the vfio_group. release is done under the struct device
> kref.
>
> - vfio_create_group()/vfio_group_put() are pairs that manage the
> sysfs/cdev lifetime. Once the uses count is zero the vfio group's
> userspace presence is destroyed.
>
> - The IDR is replaced with an IDA. container_of(inode->i_cdev)
> is used to get back to the vfio_group during fops open. The IDA
> assigns unique minor numbers.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 200 +++++++++++++++++++++-----------------------
> 1 file changed, 94 insertions(+), 106 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index dbe7edd88ce35c..01e04947250f40 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -43,9 +43,8 @@ static struct vfio {
> struct list_head iommu_drivers_list;
> struct mutex iommu_drivers_lock;
> struct list_head group_list;
> - struct idr group_idr;
> - struct mutex group_lock;
> - struct cdev group_cdev;
> + struct mutex group_lock; /* locks group_list */
> + struct ida group_ida;
> dev_t group_devt;
> } vfio;
>
> @@ -69,14 +68,14 @@ struct vfio_unbound_dev {
> };
>
> struct vfio_group {
> + struct device dev;
> + struct cdev cdev;
> refcount_t users;
> - int minor;
> atomic_t container_users;
> struct iommu_group *iommu_group;
> struct vfio_container *container;
> struct list_head device_list;
> struct mutex device_lock;
> - struct device *dev;
> struct notifier_block nb;
> struct list_head vfio_next;
> struct list_head container_next;
> @@ -98,6 +97,7 @@ MODULE_PARM_DESC(enable_unsafe_noiommu_mode,
> "Enable UNSAFE, no-IOMMU mode. Thi
> #endif
>
> static DEFINE_XARRAY(vfio_device_set_xa);
> +static const struct file_operations vfio_group_fops;
>
> int vfio_assign_device_set(struct vfio_device *device, void *set_id)
> {
> @@ -281,19 +281,6 @@ void vfio_unregister_iommu_driver(const struct
> vfio_iommu_driver_ops *ops)
> }
> EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
>
> -/**
> - * Group minor allocation/free - both called with vfio.group_lock held
> - */
> -static int vfio_alloc_group_minor(struct vfio_group *group)
> -{
> - return idr_alloc(&vfio.group_idr, group, 0, MINORMASK + 1,
> GFP_KERNEL);
> -}
> -
> -static void vfio_free_group_minor(int minor)
> -{
> - idr_remove(&vfio.group_idr, minor);
> -}
> -
> static int vfio_iommu_group_notifier(struct notifier_block *nb,
> unsigned long action, void *data);
> static void vfio_group_get(struct vfio_group *group);
> @@ -322,26 +309,6 @@ static void vfio_container_put(struct vfio_container
> *container)
> kref_put(&container->kref, vfio_container_release);
> }
>
> -static void vfio_group_unlock_and_free(struct vfio_group *group)
> -{
> - struct vfio_unbound_dev *unbound, *tmp;
> -
> - mutex_unlock(&vfio.group_lock);
> - /*
> - * Unregister outside of lock. A spurious callback is harmless now
> - * that the group is no longer in vfio.group_list.
> - */
> - iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> -
> - list_for_each_entry_safe(unbound, tmp,
> - &group->unbound_list, unbound_next) {
> - list_del(&unbound->unbound_next);
> - kfree(unbound);
> - }
> - iommu_group_put(group->iommu_group);
> - kfree(group);
> -}
> -
> /**
> * Group objects - create, release, get, put, search
> */
> @@ -370,75 +337,112 @@ vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> return group;
> }
>
> -static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> - enum vfio_group_type type)
> +static void vfio_group_release(struct device *dev)
> {
> - struct vfio_group *group, *existing_group;
> - struct device *dev;
> - int ret, minor;
> + struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> + struct vfio_unbound_dev *unbound, *tmp;
> +
> + list_for_each_entry_safe(unbound, tmp,
> + &group->unbound_list, unbound_next) {
> + list_del(&unbound->unbound_next);
> + kfree(unbound);
> + }
> +
> + mutex_destroy(&group->device_lock);
> + mutex_destroy(&group->unbound_lock);
> + iommu_group_put(group->iommu_group);
> + ida_free(&vfio.group_ida, MINOR(group->dev.devt));
> + kfree(group);
> +}
> +
> +static struct vfio_group *vfio_group_alloc(struct iommu_group
> *iommu_group,
> + enum vfio_group_type type)
> +{
> + struct vfio_group *group;
> + int minor;
>
> group = kzalloc(sizeof(*group), GFP_KERNEL);
> if (!group)
> return ERR_PTR(-ENOMEM);
>
> + minor = ida_alloc_max(&vfio.group_ida, MINORMASK, GFP_KERNEL);
> + if (minor < 0) {
> + kfree(group);
> + return ERR_PTR(minor);
> + }
> +
> + device_initialize(&group->dev);
> + group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor);
> + group->dev.class = vfio.class;
> + group->dev.release = vfio_group_release;
> + cdev_init(&group->cdev, &vfio_group_fops);
> + group->cdev.owner = THIS_MODULE;
> +
> refcount_set(&group->users, 1);
> INIT_LIST_HEAD(&group->device_list);
> mutex_init(&group->device_lock);
> INIT_LIST_HEAD(&group->unbound_list);
> mutex_init(&group->unbound_lock);
> - atomic_set(&group->container_users, 0);
> - atomic_set(&group->opened, 0);
> init_waitqueue_head(&group->container_q);
> group->iommu_group = iommu_group;
> - /* put in vfio_group_unlock_and_free() */
> + /* put in vfio_group_release() */
> iommu_group_ref_get(iommu_group);
> group->type = type;
> BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
>
> - group->nb.notifier_call = vfio_iommu_group_notifier;
> + return group;
> +}
>
> - ret = iommu_group_register_notifier(iommu_group, &group->nb);
> - if (ret) {
> - group = ERR_PTR(ret);
> - goto err_put_group;
> +static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> + enum vfio_group_type type)
> +{
> + struct vfio_group *group;
> + struct vfio_group *ret;
> + int err;
> +
> + group = vfio_group_alloc(iommu_group, type);
> + if (IS_ERR(group))
> + return group;
> +
> + err = dev_set_name(&group->dev, "%s%d",
> + group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
> + iommu_group_id(iommu_group));
> + if (err) {
> + ret = ERR_PTR(err);
> + goto err_put;
> + }
> +
> + group->nb.notifier_call = vfio_iommu_group_notifier;
> + err = iommu_group_register_notifier(iommu_group, &group->nb);
> + if (err) {
> + ret = ERR_PTR(err);
> + goto err_put;
> }
>
> mutex_lock(&vfio.group_lock);
>
> /* Did we race creating this group? */
> - existing_group = __vfio_group_get_from_iommu(iommu_group);
> - if (existing_group) {
> - vfio_group_unlock_and_free(group);
> - return existing_group;
> - }
> + ret = __vfio_group_get_from_iommu(iommu_group);
> + if (ret)
> + goto err_unlock;
>
> - minor = vfio_alloc_group_minor(group);
> - if (minor < 0) {
> - vfio_group_unlock_and_free(group);
> - return ERR_PTR(minor);
> + err = cdev_device_add(&group->cdev, &group->dev);
> + if (err) {
> + ret = ERR_PTR(err);
> + goto err_unlock;
should this err branch put the vfio_group reference acquired
in above __vfio_group_get_from_iommu(iommu_group);?
Regards,
Yi Liu
> }
>
> - dev = device_create(vfio.class, NULL,
> - MKDEV(MAJOR(vfio.group_devt), minor), group,
> "%s%d",
> - group->type == VFIO_NO_IOMMU ? "noiommu-" :
> "",
> - iommu_group_id(iommu_group));
> - if (IS_ERR(dev)) {
> - vfio_free_group_minor(minor);
> - vfio_group_unlock_and_free(group);
> - return ERR_CAST(dev);
> - }
> -
> - group->minor = minor;
> - group->dev = dev;
> -
> list_add(&group->vfio_next, &vfio.group_list);
>
> mutex_unlock(&vfio.group_lock);
> -
> -err_put_group:
> - iommu_group_put(iommu_group);
> - kfree(group);
> return group;
> +
> +err_unlock:
> + mutex_unlock(&vfio.group_lock);
> + iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +err_put:
> + put_device(&group->dev);
> + return ret;
> }
>
> static void vfio_group_put(struct vfio_group *group)
> @@ -450,10 +454,12 @@ static void vfio_group_put(struct vfio_group *group)
> WARN_ON(atomic_read(&group->container_users));
> WARN_ON(group->notifier.head);
>
> - device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
> list_del(&group->vfio_next);
> - vfio_free_group_minor(group->minor);
> - vfio_group_unlock_and_free(group);
> + cdev_device_del(&group->cdev, &group->dev);
> + mutex_unlock(&vfio.group_lock);
> +
> + iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> + put_device(&group->dev);
> }
>
> static void vfio_group_get(struct vfio_group *group)
> @@ -461,20 +467,10 @@ static void vfio_group_get(struct vfio_group *group)
> refcount_inc(&group->users);
> }
>
> -static struct vfio_group *vfio_group_get_from_minor(int minor)
> +/* returns true if the get was obtained */
> +static bool vfio_group_try_get(struct vfio_group *group)
> {
> - struct vfio_group *group;
> -
> - mutex_lock(&vfio.group_lock);
> - group = idr_find(&vfio.group_idr, minor);
> - if (!group) {
> - mutex_unlock(&vfio.group_lock);
> - return NULL;
> - }
> - vfio_group_get(group);
> - mutex_unlock(&vfio.group_lock);
> -
> - return group;
> + return refcount_inc_not_zero(&group->users);
> }
>
> static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
> @@ -1484,11 +1480,11 @@ static long vfio_group_fops_unl_ioctl(struct file
> *filep,
>
> static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> {
> - struct vfio_group *group;
> + struct vfio_group *group =
> + container_of(inode->i_cdev, struct vfio_group, cdev);
> int opened;
>
> - group = vfio_group_get_from_minor(iminor(inode));
> - if (!group)
> + if (!vfio_group_try_get(group))
> return -ENODEV;
>
> if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
> @@ -2296,7 +2292,7 @@ static int __init vfio_init(void)
> {
> int ret;
>
> - idr_init(&vfio.group_idr);
> + ida_init(&vfio.group_ida);
> mutex_init(&vfio.group_lock);
> mutex_init(&vfio.iommu_drivers_lock);
> INIT_LIST_HEAD(&vfio.group_list);
> @@ -2321,11 +2317,6 @@ static int __init vfio_init(void)
> if (ret)
> goto err_alloc_chrdev;
>
> - cdev_init(&vfio.group_cdev, &vfio_group_fops);
> - ret = cdev_add(&vfio.group_cdev, vfio.group_devt, MINORMASK + 1);
> - if (ret)
> - goto err_cdev_add;
> -
> pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>
> #ifdef CONFIG_VFIO_NOIOMMU
> @@ -2333,8 +2324,6 @@ static int __init vfio_init(void)
> #endif
> return 0;
>
> -err_cdev_add:
> - unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
> err_alloc_chrdev:
> class_destroy(vfio.class);
> vfio.class = NULL;
> @@ -2350,8 +2339,7 @@ static void __exit vfio_cleanup(void)
> #ifdef CONFIG_VFIO_NOIOMMU
> vfio_unregister_iommu_driver(&vfio_noiommu_ops);
> #endif
> - idr_destroy(&vfio.group_idr);
> - cdev_del(&vfio.group_cdev);
> + ida_destroy(&vfio.group_ida);
> unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
> class_destroy(vfio.class);
> vfio.class = NULL;
> --
> 2.33.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-12 7:08 ` Tian, Kevin
@ 2021-10-12 9:04 ` Liu, Yi L
2 siblings, 0 replies; 28+ messages in thread
From: Liu, Yi L @ 2021-10-12 9:04 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
>
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/vfio/vfio.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
> };
>
> struct vfio_group {
> - struct kref kref;
> + refcount_t users;
> int minor;
> atomic_t container_users;
> struct iommu_group *iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> if (!group)
> return ERR_PTR(-ENOMEM);
>
> - kref_init(&group->kref);
> + refcount_set(&group->users, 1);
> INIT_LIST_HEAD(&group->device_list);
> mutex_init(&group->device_lock);
> INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> return group;
> }
>
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
> {
> - struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> + if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> + return;
>
> WARN_ON(!list_empty(&group->device_list));
> WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
> vfio_group_unlock_and_free(group);
> }
>
> -static void vfio_group_put(struct vfio_group *group)
> -{
> - kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
> static void vfio_group_get(struct vfio_group *group)
> {
> - kref_get(&group->kref);
> + refcount_inc(&group->users);
> }
>
> static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group
> *vfio_group_get_external_user(struct file *filep)
> if (ret)
> return ERR_PTR(ret);
>
> + /* Since the caller holds the fget on the file users must be >= 1 */
s/on the file users must be >= 1/on the file, group->users must be >= 1/
not native speaker, but may be clearer per me. ^_^
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Regards,
Yi Liu
> vfio_group_get(group);
>
> return group;
> --
> 2.33.0
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-12 8:33 ` Tian, Kevin
@ 2021-10-12 12:05 ` Jason Gunthorpe
2021-10-13 1:07 ` Tian, Kevin
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-12 12:05 UTC (permalink / raw)
To: Tian, Kevin
Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Liu, Yi L
On Tue, Oct 12, 2021 at 08:33:49AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, October 2, 2021 7:22 AM
> >
> [...]
>
> > +static void vfio_group_release(struct device *dev)
> > {
> > - struct vfio_group *group, *existing_group;
> > - struct device *dev;
> > - int ret, minor;
> > + struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> > + struct vfio_unbound_dev *unbound, *tmp;
> > +
> > + list_for_each_entry_safe(unbound, tmp,
> > + &group->unbound_list, unbound_next) {
> > + list_del(&unbound->unbound_next);
> > + kfree(unbound);
> > + }
>
> move to vfio_group_put()? this is not paired with vfio_group_alloc()...
Lists are tricky for pairing analysis, the vfio_group_alloc() creates
an empty list and release restores the list to empty.
> > static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> > {
> > - struct vfio_group *group;
> > + struct vfio_group *group =
> > + container_of(inode->i_cdev, struct vfio_group, cdev);
> > int opened;
>
> A curiosity question. According to cdev_device_del() any cdev already
> open will remain with their fops callable.
Correct
> What prevents vfio_group from being released after cdev_device_del()
> has been called? Is it because cdev open will hold a reference to
> device thus put_device() will not hit zero in vfio_group_put()?
Yes, that is right. The extra reference is hidden deep inside the FS
code and is actually a reference on the cdev struct, which in turn
holds a kobject parent reference on the struct device. It is
complicated under the covers, but from an API perspective if a struct
file exists then so does the vfio_group.
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-12 12:05 ` Jason Gunthorpe
@ 2021-10-13 1:07 ` Tian, Kevin
0 siblings, 0 replies; 28+ messages in thread
From: Tian, Kevin @ 2021-10-13 1:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, October 12, 2021 8:05 PM
>
> On Tue, Oct 12, 2021 at 08:33:49AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, October 2, 2021 7:22 AM
> > >
> > [...]
> >
> > > +static void vfio_group_release(struct device *dev)
> > > {
> > > - struct vfio_group *group, *existing_group;
> > > - struct device *dev;
> > > - int ret, minor;
> > > + struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> > > + struct vfio_unbound_dev *unbound, *tmp;
> > > +
> > > + list_for_each_entry_safe(unbound, tmp,
> > > + &group->unbound_list, unbound_next) {
> > > + list_del(&unbound->unbound_next);
> > > + kfree(unbound);
> > > + }
> >
> > move to vfio_group_put()? this is not paired with vfio_group_alloc()...
>
> Lists are tricky for pairing analysis, the vfio_group_alloc() creates
> an empty list and release restores the list to empty.
items are added to this list after vfio_create_group() (in the start of
vfio_unregister_group_dev()). So I feel it makes more sense to move
it back to empty before put_device() in vfio_group_put(). But not a
strong opinion...
>
> > > static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> > > {
> > > - struct vfio_group *group;
> > > + struct vfio_group *group =
> > > + container_of(inode->i_cdev, struct vfio_group, cdev);
> > > int opened;
> >
> > A curiosity question. According to cdev_device_del() any cdev already
> > open will remain with their fops callable.
>
> Correct
>
> > What prevents vfio_group from being released after cdev_device_del()
> > has been called? Is it because cdev open will hold a reference to
> > device thus put_device() will not hit zero in vfio_group_put()?
>
> Yes, that is right. The extra reference is hidden deep inside the FS
> code and is actually a reference on the cdev struct, which in turn
> holds a kobject parent reference on the struct device. It is
> complicated under the covers, but from an API perspective if a struct
> file exists then so does the vfio_group.
>
Make sense. Thanks for explanation.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-12 8:57 ` Liu, Yi L
@ 2021-10-13 12:49 ` Jason Gunthorpe
2021-10-13 14:15 ` Liu, Yi L
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2021-10-13 12:49 UTC (permalink / raw)
To: Liu, Yi L
Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin
On Tue, Oct 12, 2021 at 08:57:22AM +0000, Liu, Yi L wrote:
> > + ret = __vfio_group_get_from_iommu(iommu_group);
> > + if (ret)
> > + goto err_unlock;
> >
> > - minor = vfio_alloc_group_minor(group);
> > - if (minor < 0) {
> > - vfio_group_unlock_and_free(group);
> > - return ERR_PTR(minor);
> > + err = cdev_device_add(&group->cdev, &group->dev);
> > + if (err) {
> > + ret = ERR_PTR(err);
> > + goto err_unlock;
>
> should this err branch put the vfio_group reference acquired
> in above __vfio_group_get_from_iommu(iommu_group);?
No, if ret was !NULL then the reference is returned to the caller via
the err_unlock
This path to cdev_device_add has ret = NULL so there is no reference
to put back.
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-13 12:49 ` Jason Gunthorpe
@ 2021-10-13 14:15 ` Liu, Yi L
0 siblings, 0 replies; 28+ messages in thread
From: Liu, Yi L @ 2021-10-13 14:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, October 13, 2021 8:49 PM
>
> On Tue, Oct 12, 2021 at 08:57:22AM +0000, Liu, Yi L wrote:
> > > + ret = __vfio_group_get_from_iommu(iommu_group);
> > > + if (ret)
> > > + goto err_unlock;
> > >
> > > - minor = vfio_alloc_group_minor(group);
> > > - if (minor < 0) {
> > > - vfio_group_unlock_and_free(group);
> > > - return ERR_PTR(minor);
> > > + err = cdev_device_add(&group->cdev, &group->dev);
> > > + if (err) {
> > > + ret = ERR_PTR(err);
> > > + goto err_unlock;
> >
> > should this err branch put the vfio_group reference acquired
> > in above __vfio_group_get_from_iommu(iommu_group);?
>
> No, if ret was !NULL then the reference is returned to the caller via
> the err_unlock
>
> This path to cdev_device_add has ret = NULL so there is no reference
> to put back.
oh, yes. only if ret is NULL will go ahead to this place.
Thanks,
Yi Liu
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2021-10-13 14:16 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:34 ` Jason Gunthorpe
2021-10-05 4:01 ` Alex Williamson
2021-10-05 16:17 ` Jason Gunthorpe
2021-10-12 6:32 ` Tian, Kevin
2021-10-12 8:51 ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
2021-10-12 6:37 ` Tian, Kevin
2021-10-12 8:52 ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:36 ` Jason Gunthorpe
2021-10-05 4:01 ` Alex Williamson
2021-10-05 14:45 ` Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-04 22:25 ` Alex Williamson
2021-10-04 22:39 ` Jason Gunthorpe
2021-10-12 7:08 ` Tian, Kevin
2021-10-12 9:04 ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
2021-10-12 8:33 ` Tian, Kevin
2021-10-12 12:05 ` Jason Gunthorpe
2021-10-13 1:07 ` Tian, Kevin
2021-10-12 8:57 ` Liu, Yi L
2021-10-13 12:49 ` Jason Gunthorpe
2021-10-13 14:15 ` Liu, Yi L
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).