* [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle
@ 2021-10-15 11:40 Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime. This simple
pattern ties all the state (vfio, dev, and cdev) together in one memory
structure and uses container_of() to navigate between the layers.
This is a followup to the discussion here:
https://lore.kernel.org/kvm/20210921155705.GN327412@nvidia.com/
This builds on Christoph's work to revise how the vfio_group works and is
against the latest VFIO tree.
This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_cdev
v3:
- Streamline vfio_group_find_or_alloc()
- Remove vfio_group_try_get() and just opencode the refcount_inc_not_zero
v2: https://lore.kernel.org/r/0-v2-fd9627d27b2b+26c-vfio_group_cdev_jgg@nvidia.com
- Remove comment before iommu_group_unregister_notifier()
- Add comment explaining what the WARN_ONs vfio_group_put() do
- Fix error logic around vfio_create_group() in patch 3
- Add horizontal whitespace
- Clarify comment is refering to group->users
v1: https://lore.kernel.org/r/0-v1-fba989159158+2f9b-vfio_group_cdev_jgg@nvidia.com
Cc: Liu Yi L <yi.l.liu@intel.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe (5):
vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
vfio: Do not open code the group list search in vfio_create_group()
vfio: Don't leak a group reference if the group already exists
vfio: Use a refcount_t instead of a kref in the vfio_group
vfio: Use cdev_device_add() instead of device_create()
drivers/vfio/vfio.c | 381 +++++++++++++++++---------------------------
1 file changed, 149 insertions(+), 232 deletions(-)
base-commit: d9a0cd510c3383b61db6f70a84e0c3487f836a63
--
2.33.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
@ 2021-10-15 11:40 ` Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.
When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.
Prior to vfio_create_group() unlocking and returning we have
container_users == 0
device_list == empty
And this cannot change until the mutex is unlocked.
After the kref goes to zero we must also have
container_users == 0
device_list == empty
Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.
These two facts are important because when checking each operation we see:
- IOMMU_GROUP_NOTIFY_ADD_DEVICE
Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
0 container_users ends the call
Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.
Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.
This allows deleting all of the deferred group put code.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 100 +++++++-------------------------------------
1 file changed, 15 insertions(+), 85 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 08b27b64f0f935..4ce7e9fe43af95 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -324,12 +324,16 @@ static void vfio_container_put(struct vfio_container *container)
static void vfio_group_unlock_and_free(struct vfio_group *group)
{
+ struct vfio_unbound_dev *unbound, *tmp;
+
mutex_unlock(&vfio.group_lock);
- /*
- * Unregister outside of lock. A spurious callback is harmless now
- * that the group is no longer in vfio.group_list.
- */
iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+
+ list_for_each_entry_safe(unbound, tmp,
+ &group->unbound_list, unbound_next) {
+ list_del(&unbound->unbound_next);
+ kfree(unbound);
+ }
kfree(group);
}
@@ -360,14 +364,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
group->nb.notifier_call = vfio_iommu_group_notifier;
-
- /*
- * blocking notifiers acquire a rwsem around registering and hold
- * it around callback. Therefore, need to register outside of
- * vfio.group_lock to avoid A-B/B-A contention. Our callback won't
- * do anything unless it can find the group in vfio.group_list, so
- * no harm in registering early.
- */
ret = iommu_group_register_notifier(iommu_group, &group->nb);
if (ret) {
kfree(group);
@@ -415,18 +411,18 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
static void vfio_group_release(struct kref *kref)
{
struct vfio_group *group = container_of(kref, struct vfio_group, kref);
- struct vfio_unbound_dev *unbound, *tmp;
struct iommu_group *iommu_group = group->iommu_group;
+ /*
+ * These data structures all have paired operations that can only be
+ * undone when the caller holds a live reference on the group. Since all
+ * pairs must be undone these WARN_ON's indicate some caller did not
+ * properly hold the group reference.
+ */
WARN_ON(!list_empty(&group->device_list));
+ WARN_ON(atomic_read(&group->container_users));
WARN_ON(group->notifier.head);
- list_for_each_entry_safe(unbound, tmp,
- &group->unbound_list, unbound_next) {
- list_del(&unbound->unbound_next);
- kfree(unbound);
- }
-
device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
list_del(&group->vfio_next);
vfio_free_group_minor(group->minor);
@@ -439,61 +435,12 @@ static void vfio_group_put(struct vfio_group *group)
kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
}
-struct vfio_group_put_work {
- struct work_struct work;
- struct vfio_group *group;
-};
-
-static void vfio_group_put_bg(struct work_struct *work)
-{
- struct vfio_group_put_work *do_work;
-
- do_work = container_of(work, struct vfio_group_put_work, work);
-
- vfio_group_put(do_work->group);
- kfree(do_work);
-}
-
-static void vfio_group_schedule_put(struct vfio_group *group)
-{
- struct vfio_group_put_work *do_work;
-
- do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
- if (WARN_ON(!do_work))
- return;
-
- INIT_WORK(&do_work->work, vfio_group_put_bg);
- do_work->group = group;
- schedule_work(&do_work->work);
-}
-
/* Assume group_lock or group reference is held */
static void vfio_group_get(struct vfio_group *group)
{
kref_get(&group->kref);
}
-/*
- * Not really a try as we will sleep for mutex, but we need to make
- * sure the group pointer is valid under lock and get a reference.
- */
-static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
-{
- struct vfio_group *target = group;
-
- mutex_lock(&vfio.group_lock);
- list_for_each_entry(group, &vfio.group_list, vfio_next) {
- if (group == target) {
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
- return group;
- }
- }
- mutex_unlock(&vfio.group_lock);
-
- return NULL;
-}
-
static
struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
{
@@ -691,14 +638,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
struct device *dev = data;
struct vfio_unbound_dev *unbound;
- /*
- * Need to go through a group_lock lookup to get a reference or we
- * risk racing a group being removed. Ignore spurious notifies.
- */
- group = vfio_group_try_get(group);
- if (!group)
- return NOTIFY_OK;
-
switch (action) {
case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
vfio_group_nb_add_dev(group, dev);
@@ -749,15 +688,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
mutex_unlock(&group->unbound_lock);
break;
}
-
- /*
- * If we're the last reference to the group, the group will be
- * released, which includes unregistering the iommu group notifier.
- * We hold a read-lock on that notifier list, unregistering needs
- * a write-lock... deadlock. Release our reference asynchronously
- * to avoid that situation.
- */
- vfio_group_schedule_put(group);
return NOTIFY_OK;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/5] vfio: Do not open code the group list search in vfio_create_group()
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
@ 2021-10-15 11:40 ` Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
that vfio_create_group() can call it to consolidate this duplicated code.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4ce7e9fe43af95..513fb5a4c102db 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -340,10 +340,35 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
/**
* Group objects - create, release, get, put, search
*/
+static struct vfio_group *
+__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+ struct vfio_group *group;
+
+ list_for_each_entry(group, &vfio.group_list, vfio_next) {
+ if (group->iommu_group == iommu_group) {
+ vfio_group_get(group);
+ return group;
+ }
+ }
+ return NULL;
+}
+
+static struct vfio_group *
+vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+ struct vfio_group *group;
+
+ mutex_lock(&vfio.group_lock);
+ group = __vfio_group_get_from_iommu(iommu_group);
+ mutex_unlock(&vfio.group_lock);
+ return group;
+}
+
static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
enum vfio_group_type type)
{
- struct vfio_group *group, *tmp;
+ struct vfio_group *group, *existing_group;
struct device *dev;
int ret, minor;
@@ -373,12 +398,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
mutex_lock(&vfio.group_lock);
/* Did we race creating this group? */
- list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
- if (tmp->iommu_group == iommu_group) {
- vfio_group_get(tmp);
- vfio_group_unlock_and_free(group);
- return tmp;
- }
+ existing_group = __vfio_group_get_from_iommu(iommu_group);
+ if (existing_group) {
+ vfio_group_unlock_and_free(group);
+ return existing_group;
}
minor = vfio_alloc_group_minor(group);
@@ -441,24 +464,6 @@ static void vfio_group_get(struct vfio_group *group)
kref_get(&group->kref);
}
-static
-struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
-{
- struct vfio_group *group;
-
- mutex_lock(&vfio.group_lock);
- list_for_each_entry(group, &vfio.group_list, vfio_next) {
- if (group->iommu_group == iommu_group) {
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
- return group;
- }
- }
- mutex_unlock(&vfio.group_lock);
-
- return NULL;
-}
-
static struct vfio_group *vfio_group_get_from_minor(int minor)
{
struct vfio_group *group;
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 3/5] vfio: Don't leak a group reference if the group already exists
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
@ 2021-10-15 11:40 ` Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.
Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().
This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.
Fixes: cba3345cc494 ("vfio: VFIO core")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 513fb5a4c102db..4abb2e5e196536 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -334,6 +334,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
list_del(&unbound->unbound_next);
kfree(unbound);
}
+ iommu_group_put(group->iommu_group);
kfree(group);
}
@@ -385,12 +386,15 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
atomic_set(&group->opened, 0);
init_waitqueue_head(&group->container_q);
group->iommu_group = iommu_group;
+ /* put in vfio_group_unlock_and_free() */
+ iommu_group_ref_get(iommu_group);
group->type = type;
BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
group->nb.notifier_call = vfio_iommu_group_notifier;
ret = iommu_group_register_notifier(iommu_group, &group->nb);
if (ret) {
+ iommu_group_put(iommu_group);
kfree(group);
return ERR_PTR(ret);
}
@@ -426,7 +430,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
list_add(&group->vfio_next, &vfio.group_list);
mutex_unlock(&vfio.group_lock);
-
return group;
}
@@ -434,7 +437,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
static void vfio_group_release(struct kref *kref)
{
struct vfio_group *group = container_of(kref, struct vfio_group, kref);
- struct iommu_group *iommu_group = group->iommu_group;
/*
* These data structures all have paired operations that can only be
@@ -450,7 +452,6 @@ static void vfio_group_release(struct kref *kref)
list_del(&group->vfio_next);
vfio_free_group_minor(group->minor);
vfio_group_unlock_and_free(group);
- iommu_group_put(iommu_group);
}
static void vfio_group_put(struct vfio_group *group)
@@ -735,7 +736,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
ret = PTR_ERR(group);
goto out_remove_device;
}
-
+ iommu_group_put(iommu_group);
return group;
out_remove_device:
@@ -770,18 +771,11 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
if (!iommu_group)
return ERR_PTR(-EINVAL);
- /* a found vfio_group already holds a reference to the iommu_group */
group = vfio_group_get_from_iommu(iommu_group);
- if (group)
- goto out_put;
+ if (!group)
+ group = vfio_create_group(iommu_group, VFIO_IOMMU);
- /* a newly created vfio_group keeps the reference. */
- group = vfio_create_group(iommu_group, VFIO_IOMMU);
- if (IS_ERR(group))
- goto out_put;
- return group;
-
-out_put:
+ /* The vfio_group holds a reference to the iommu_group */
iommu_group_put(iommu_group);
return group;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (2 preceding siblings ...)
2021-10-15 11:40 ` [PATCH v3 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
@ 2021-10-15 11:40 ` Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.
The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4abb2e5e196536..e313fa030b9185 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -69,7 +69,7 @@ struct vfio_unbound_dev {
};
struct vfio_group {
- struct kref kref;
+ refcount_t users;
int minor;
atomic_t container_users;
struct iommu_group *iommu_group;
@@ -377,7 +377,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
if (!group)
return ERR_PTR(-ENOMEM);
- kref_init(&group->kref);
+ refcount_set(&group->users, 1);
INIT_LIST_HEAD(&group->device_list);
mutex_init(&group->device_lock);
INIT_LIST_HEAD(&group->unbound_list);
@@ -433,10 +433,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
return group;
}
-/* called with vfio.group_lock held */
-static void vfio_group_release(struct kref *kref)
+static void vfio_group_put(struct vfio_group *group)
{
- struct vfio_group *group = container_of(kref, struct vfio_group, kref);
+ if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
+ return;
/*
* These data structures all have paired operations that can only be
@@ -454,15 +454,9 @@ static void vfio_group_release(struct kref *kref)
vfio_group_unlock_and_free(group);
}
-static void vfio_group_put(struct vfio_group *group)
-{
- kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
-}
-
-/* Assume group_lock or group reference is held */
static void vfio_group_get(struct vfio_group *group)
{
- kref_get(&group->kref);
+ refcount_inc(&group->users);
}
static struct vfio_group *vfio_group_get_from_minor(int minor)
@@ -1657,6 +1651,9 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
if (ret)
return ERR_PTR(ret);
+ /*
+ * Since the caller holds the fget on the file group->users must be >= 1
+ */
vfio_group_get(group);
return group;
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 5/5] vfio: Use cdev_device_add() instead of device_create()
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (3 preceding siblings ...)
2021-10-15 11:40 ` [PATCH v3 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
@ 2021-10-15 11:40 ` Jason Gunthorpe
2021-10-15 13:20 ` [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Christoph Hellwig
2021-10-20 19:52 ` Alex Williamson
6 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-15 11:40 UTC (permalink / raw)
To: Alex Williamson, Cornelia Huck, kvm
Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L
Modernize how vfio is creating the group char dev and sysfs presence.
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.
This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.
Split the code to make this possible:
- vfio_group_alloc()/vfio_group_release() are pair'd functions to
alloc/free the vfio_group. release is done under the struct device
kref.
- vfio_create_group()/vfio_group_put() are pairs that manage the
sysfs/cdev lifetime. Once the uses count is zero the vfio group's
userspace presence is destroyed.
- The IDR is replaced with an IDA. container_of(inode->i_cdev)
is used to get back to the vfio_group during fops open. The IDA
assigns unique minor numbers.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio.c | 193 +++++++++++++++++++++-----------------------
1 file changed, 92 insertions(+), 101 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e313fa030b9185..82fb75464f923d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -43,9 +43,8 @@ static struct vfio {
struct list_head iommu_drivers_list;
struct mutex iommu_drivers_lock;
struct list_head group_list;
- struct idr group_idr;
- struct mutex group_lock;
- struct cdev group_cdev;
+ struct mutex group_lock; /* locks group_list */
+ struct ida group_ida;
dev_t group_devt;
} vfio;
@@ -69,14 +68,14 @@ struct vfio_unbound_dev {
};
struct vfio_group {
+ struct device dev;
+ struct cdev cdev;
refcount_t users;
- int minor;
atomic_t container_users;
struct iommu_group *iommu_group;
struct vfio_container *container;
struct list_head device_list;
struct mutex device_lock;
- struct device *dev;
struct notifier_block nb;
struct list_head vfio_next;
struct list_head container_next;
@@ -98,6 +97,7 @@ MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode. Thi
#endif
static DEFINE_XARRAY(vfio_device_set_xa);
+static const struct file_operations vfio_group_fops;
int vfio_assign_device_set(struct vfio_device *device, void *set_id)
{
@@ -281,19 +281,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
}
EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
-/**
- * Group minor allocation/free - both called with vfio.group_lock held
- */
-static int vfio_alloc_group_minor(struct vfio_group *group)
-{
- return idr_alloc(&vfio.group_idr, group, 0, MINORMASK + 1, GFP_KERNEL);
-}
-
-static void vfio_free_group_minor(int minor)
-{
- idr_remove(&vfio.group_idr, minor);
-}
-
static int vfio_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data);
static void vfio_group_get(struct vfio_group *group);
@@ -322,22 +309,6 @@ static void vfio_container_put(struct vfio_container *container)
kref_put(&container->kref, vfio_container_release);
}
-static void vfio_group_unlock_and_free(struct vfio_group *group)
-{
- struct vfio_unbound_dev *unbound, *tmp;
-
- mutex_unlock(&vfio.group_lock);
- iommu_group_unregister_notifier(group->iommu_group, &group->nb);
-
- list_for_each_entry_safe(unbound, tmp,
- &group->unbound_list, unbound_next) {
- list_del(&unbound->unbound_next);
- kfree(unbound);
- }
- iommu_group_put(group->iommu_group);
- kfree(group);
-}
-
/**
* Group objects - create, release, get, put, search
*/
@@ -366,71 +337,112 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
return group;
}
-static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
- enum vfio_group_type type)
+static void vfio_group_release(struct device *dev)
{
- struct vfio_group *group, *existing_group;
- struct device *dev;
- int ret, minor;
+ struct vfio_group *group = container_of(dev, struct vfio_group, dev);
+ struct vfio_unbound_dev *unbound, *tmp;
+
+ list_for_each_entry_safe(unbound, tmp,
+ &group->unbound_list, unbound_next) {
+ list_del(&unbound->unbound_next);
+ kfree(unbound);
+ }
+
+ mutex_destroy(&group->device_lock);
+ mutex_destroy(&group->unbound_lock);
+ iommu_group_put(group->iommu_group);
+ ida_free(&vfio.group_ida, MINOR(group->dev.devt));
+ kfree(group);
+}
+
+static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
+ enum vfio_group_type type)
+{
+ struct vfio_group *group;
+ int minor;
group = kzalloc(sizeof(*group), GFP_KERNEL);
if (!group)
return ERR_PTR(-ENOMEM);
+ minor = ida_alloc_max(&vfio.group_ida, MINORMASK, GFP_KERNEL);
+ if (minor < 0) {
+ kfree(group);
+ return ERR_PTR(minor);
+ }
+
+ device_initialize(&group->dev);
+ group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor);
+ group->dev.class = vfio.class;
+ group->dev.release = vfio_group_release;
+ cdev_init(&group->cdev, &vfio_group_fops);
+ group->cdev.owner = THIS_MODULE;
+
refcount_set(&group->users, 1);
INIT_LIST_HEAD(&group->device_list);
mutex_init(&group->device_lock);
INIT_LIST_HEAD(&group->unbound_list);
mutex_init(&group->unbound_lock);
- atomic_set(&group->container_users, 0);
- atomic_set(&group->opened, 0);
init_waitqueue_head(&group->container_q);
group->iommu_group = iommu_group;
- /* put in vfio_group_unlock_and_free() */
+ /* put in vfio_group_release() */
iommu_group_ref_get(iommu_group);
group->type = type;
BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
+ return group;
+}
+
+static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
+ enum vfio_group_type type)
+{
+ struct vfio_group *group;
+ struct vfio_group *ret;
+ int err;
+
+ group = vfio_group_alloc(iommu_group, type);
+ if (IS_ERR(group))
+ return group;
+
+ err = dev_set_name(&group->dev, "%s%d",
+ group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
+ iommu_group_id(iommu_group));
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_put;
+ }
+
group->nb.notifier_call = vfio_iommu_group_notifier;
- ret = iommu_group_register_notifier(iommu_group, &group->nb);
- if (ret) {
- iommu_group_put(iommu_group);
- kfree(group);
- return ERR_PTR(ret);
+ err = iommu_group_register_notifier(iommu_group, &group->nb);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_put;
}
mutex_lock(&vfio.group_lock);
/* Did we race creating this group? */
- existing_group = __vfio_group_get_from_iommu(iommu_group);
- if (existing_group) {
- vfio_group_unlock_and_free(group);
- return existing_group;
- }
+ ret = __vfio_group_get_from_iommu(iommu_group);
+ if (ret)
+ goto err_unlock;
- minor = vfio_alloc_group_minor(group);
- if (minor < 0) {
- vfio_group_unlock_and_free(group);
- return ERR_PTR(minor);
+ err = cdev_device_add(&group->cdev, &group->dev);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto err_unlock;
}
- dev = device_create(vfio.class, NULL,
- MKDEV(MAJOR(vfio.group_devt), minor), group, "%s%d",
- group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
- iommu_group_id(iommu_group));
- if (IS_ERR(dev)) {
- vfio_free_group_minor(minor);
- vfio_group_unlock_and_free(group);
- return ERR_CAST(dev);
- }
-
- group->minor = minor;
- group->dev = dev;
-
list_add(&group->vfio_next, &vfio.group_list);
mutex_unlock(&vfio.group_lock);
return group;
+
+err_unlock:
+ mutex_unlock(&vfio.group_lock);
+ iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+err_put:
+ put_device(&group->dev);
+ return ret;
}
static void vfio_group_put(struct vfio_group *group)
@@ -448,10 +460,12 @@ static void vfio_group_put(struct vfio_group *group)
WARN_ON(atomic_read(&group->container_users));
WARN_ON(group->notifier.head);
- device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
list_del(&group->vfio_next);
- vfio_free_group_minor(group->minor);
- vfio_group_unlock_and_free(group);
+ cdev_device_del(&group->cdev, &group->dev);
+ mutex_unlock(&vfio.group_lock);
+
+ iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+ put_device(&group->dev);
}
static void vfio_group_get(struct vfio_group *group)
@@ -459,22 +473,6 @@ static void vfio_group_get(struct vfio_group *group)
refcount_inc(&group->users);
}
-static struct vfio_group *vfio_group_get_from_minor(int minor)
-{
- struct vfio_group *group;
-
- mutex_lock(&vfio.group_lock);
- group = idr_find(&vfio.group_idr, minor);
- if (!group) {
- mutex_unlock(&vfio.group_lock);
- return NULL;
- }
- vfio_group_get(group);
- mutex_unlock(&vfio.group_lock);
-
- return group;
-}
-
static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
{
struct iommu_group *iommu_group;
@@ -1479,11 +1477,12 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
static int vfio_group_fops_open(struct inode *inode, struct file *filep)
{
- struct vfio_group *group;
+ struct vfio_group *group =
+ container_of(inode->i_cdev, struct vfio_group, cdev);
int opened;
- group = vfio_group_get_from_minor(iminor(inode));
- if (!group)
+ /* users can be zero if this races with vfio_group_put() */
+ if (!refcount_inc_not_zero(&group->users))
return -ENODEV;
if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
@@ -2293,7 +2292,7 @@ static int __init vfio_init(void)
{
int ret;
- idr_init(&vfio.group_idr);
+ ida_init(&vfio.group_ida);
mutex_init(&vfio.group_lock);
mutex_init(&vfio.iommu_drivers_lock);
INIT_LIST_HEAD(&vfio.group_list);
@@ -2318,11 +2317,6 @@ static int __init vfio_init(void)
if (ret)
goto err_alloc_chrdev;
- cdev_init(&vfio.group_cdev, &vfio_group_fops);
- ret = cdev_add(&vfio.group_cdev, vfio.group_devt, MINORMASK + 1);
- if (ret)
- goto err_cdev_add;
-
pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
#ifdef CONFIG_VFIO_NOIOMMU
@@ -2330,8 +2324,6 @@ static int __init vfio_init(void)
#endif
return 0;
-err_cdev_add:
- unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
err_alloc_chrdev:
class_destroy(vfio.class);
vfio.class = NULL;
@@ -2347,8 +2339,7 @@ static void __exit vfio_cleanup(void)
#ifdef CONFIG_VFIO_NOIOMMU
vfio_unregister_iommu_driver(&vfio_noiommu_ops);
#endif
- idr_destroy(&vfio.group_idr);
- cdev_del(&vfio.group_cdev);
+ ida_destroy(&vfio.group_ida);
unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
class_destroy(vfio.class);
vfio.class = NULL;
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (4 preceding siblings ...)
2021-10-15 11:40 ` [PATCH v3 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
@ 2021-10-15 13:20 ` Christoph Hellwig
2021-10-20 19:52 ` Alex Williamson
6 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-10-15 13:20 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Tian,
Kevin, Liu Yi L
On Fri, Oct 15, 2021 at 08:40:49AM -0300, Jason Gunthorpe wrote:
> This builds on Christoph's work to revise how the vfio_group works and is
> against the latest VFIO tree.
Which already is in vfio/next now, btw.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
` (5 preceding siblings ...)
2021-10-15 13:20 ` [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Christoph Hellwig
@ 2021-10-20 19:52 ` Alex Williamson
6 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2021-10-20 19:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L
On Fri, 15 Oct 2021 08:40:49 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> These days drivers with state should use cdev_device_add() and
> cdev_device_del() to manage the cdev and sysfs lifetime. This simple
> pattern ties all the state (vfio, dev, and cdev) together in one memory
> structure and uses container_of() to navigate between the layers.
>
> This is a followup to the discussion here:
>
> https://lore.kernel.org/kvm/20210921155705.GN327412@nvidia.com/
>
> This builds on Christoph's work to revise how the vfio_group works and is
> against the latest VFIO tree.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_cdev
>
> v3:
> - Streamline vfio_group_find_or_alloc()
> - Remove vfio_group_try_get() and just opencode the refcount_inc_not_zero
> v2: https://lore.kernel.org/r/0-v2-fd9627d27b2b+26c-vfio_group_cdev_jgg@nvidia.com
> - Remove comment before iommu_group_unregister_notifier()
> - Add comment explaining what the WARN_ONs vfio_group_put() do
> - Fix error logic around vfio_create_group() in patch 3
> - Add horizontal whitespace
> - Clarify comment is refering to group->users
> v1: https://lore.kernel.org/r/0-v1-fba989159158+2f9b-vfio_group_cdev_jgg@nvidia.com
>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> Jason Gunthorpe (5):
> vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
> vfio: Do not open code the group list search in vfio_create_group()
> vfio: Don't leak a group reference if the group already exists
> vfio: Use a refcount_t instead of a kref in the vfio_group
> vfio: Use cdev_device_add() instead of device_create()
>
> drivers/vfio/vfio.c | 381 +++++++++++++++++---------------------------
> 1 file changed, 149 insertions(+), 232 deletions(-)
Applied to vfio next branch for v5.16. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-10-20 19:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 11:40 [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-15 11:40 ` [PATCH v3 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
2021-10-15 13:20 ` [PATCH v3 0/5] Update vfio_group to use the modern cdev lifecycle Christoph Hellwig
2021-10-20 19:52 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).