kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Allow the group FD to remain open when unplugging a device
@ 2022-10-06 12:40 Jason Gunthorpe
  2022-10-06 12:40 ` [PATCH 1/3] vfio: Add vfio_file_is_group() Jason Gunthorpe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 12:40 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini
  Cc: Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Matthew Rosato,
	Yi Liu

Testing has shown that virtnodedevd will leave the group FD open for long
periods, even after all the cdevs have been destroyed. This blocks
destruction of the VFIO device and is undesirable.

That approach was selected to accomodate SPAPR which has an broken
lifecyle model for the iommu_group. However, we can accomodate SPAPR by
realizing that it doesn't use the iommu core at all, so rules about
iommu_group lifetime do not apply to it.

Giving the KVM code its own kref on the iommu_group allows the VFIO core
code to release its iommu_group reference earlier and we can remove the
sleep that only existed for SPAPR.

Jason Gunthorpe (3):
  vfio: Add vfio_file_is_group()
  vfio: Hold a reference to the iommu_group in kvm for SPAPR
  vfio: Make the group FD disassociate from the iommu_group

 drivers/vfio/pci/vfio_pci_core.c |  2 +-
 drivers/vfio/vfio.h              |  1 -
 drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
 include/linux/vfio.h             |  1 +
 virt/kvm/vfio.c                  | 45 +++++++++++-----
 5 files changed, 94 insertions(+), 45 deletions(-)


base-commit: c82e81ab2569559ad873b3061217c2f37560682b
-- 
2.37.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] vfio: Add vfio_file_is_group()
  2022-10-06 12:40 [PATCH 0/3] Allow the group FD to remain open when unplugging a device Jason Gunthorpe
@ 2022-10-06 12:40 ` Jason Gunthorpe
  2022-10-06 18:11   ` Alex Williamson
  2022-10-06 12:40 ` [PATCH 2/3] vfio: Hold a reference to the iommu_group in kvm for SPAPR Jason Gunthorpe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 12:40 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini
  Cc: Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Matthew Rosato,
	Yi Liu

This replaces uses of vfio_file_iommu_group() which were only detecting if
the file is a VFIO file with no interest in the actual group.

The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR
stuff. It passes the iommu_group into the arch code through kvm for some
reason.

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c |  2 +-
 drivers/vfio/vfio_main.c         | 14 ++++++++++++++
 include/linux/vfio.h             |  1 +
 virt/kvm/vfio.c                  | 20 ++++++++++++++++++--
 4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 59a28251bb0b97..badc9d828cac20 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1313,7 +1313,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		}
 
 		/* Ensure the FD is a vfio group FD.*/
-		if (!vfio_file_iommu_group(file)) {
+		if (!vfio_file_is_group(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 9207e6c0e3cb26..7866849be56ef6 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1553,17 +1553,31 @@ static const struct file_operations vfio_device_fops = {
  * @file: VFIO group file
  *
  * The returned iommu_group is valid as long as a ref is held on the file.
+ * This function is deprecated, only the SPAPR path in kvm should call it.
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
 	struct vfio_group *group = file->private_data;
 
+	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
+		return NULL;
+
 	if (file->f_op != &vfio_group_fops)
 		return NULL;
 	return group->iommu_group;
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
+/**
+ * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * @file: VFIO group file
+ */
+bool vfio_file_is_group(struct file *file)
+{
+	return file->f_op == &vfio_group_fops;
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_group);
+
 /**
  * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
  *        is always CPU cache coherent
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ee399a768070d0..e7cebeb875dd1a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -199,6 +199,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  * External user API
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
+bool vfio_file_is_group(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index ce1b01d02c5197..54aec3b0559c70 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -61,6 +61,23 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
+static bool kvm_vfio_file_is_group(struct file *file)
+{
+	bool (*fn)(struct file *file);
+	bool ret;
+
+	fn = symbol_get(vfio_file_is_group);
+	if (!fn)
+		return false;
+
+	ret = fn(file);
+
+	symbol_put(vfio_file_is_group);
+
+	return ret;
+}
+
+#ifdef CONFIG_SPAPR_TCE_IOMMU
 static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 {
 	struct iommu_group *(*fn)(struct file *file);
@@ -77,7 +94,6 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 	return ret;
 }
 
-#ifdef CONFIG_SPAPR_TCE_IOMMU
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
 					     struct kvm_vfio_group *kvg)
 {
@@ -136,7 +152,7 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 		return -EBADF;
 
 	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_iommu_group(filp)) {
+	if (!kvm_vfio_file_is_group(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] vfio: Hold a reference to the iommu_group in kvm for SPAPR
  2022-10-06 12:40 [PATCH 0/3] Allow the group FD to remain open when unplugging a device Jason Gunthorpe
  2022-10-06 12:40 ` [PATCH 1/3] vfio: Add vfio_file_is_group() Jason Gunthorpe
@ 2022-10-06 12:40 ` Jason Gunthorpe
  2022-10-06 12:40 ` [PATCH 3/3] vfio: Make the group FD disassociate from the iommu_group Jason Gunthorpe
  2022-10-06 19:53 ` [PATCH 0/3] Allow the group FD to remain open when unplugging a device Alex Williamson
  3 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 12:40 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini
  Cc: Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Matthew Rosato,
	Yi Liu

SPAPR exists completely outside the normal iommu driver framework, the
groups it creates are fake and are only created to enable VFIO's uAPI.

Thus, it does not need to follow the iommu core rule that the iommu_group
will only be touched while a driver is attached.

Carry a group reference into KVM and have KVM directly manage the lifetime
of this object independently of VFIO. This means KVM no longer relies on
the vfio group file being valid to maintain the group reference.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c |  6 ++++--
 virt/kvm/vfio.c          | 25 ++++++++++++++-----------
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 7866849be56ef6..233349867fb36a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1552,8 +1552,9 @@ static const struct file_operations vfio_device_fops = {
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
  *
- * The returned iommu_group is valid as long as a ref is held on the file.
- * This function is deprecated, only the SPAPR path in kvm should call it.
+ * The returned iommu_group is valid as long as a ref is held on the file. This
+ * returns a reference on the group. This function is deprecated, only the SPAPR
+ * path in kvm should call it.
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
@@ -1564,6 +1565,7 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 
 	if (file->f_op != &vfio_group_fops)
 		return NULL;
+	iommu_group_ref_get(group->iommu_group);
 	return group->iommu_group;
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 54aec3b0559c70..495ceabffe88bb 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -24,6 +24,9 @@
 struct kvm_vfio_group {
 	struct list_head node;
 	struct file *file;
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	struct iommu_group *iommu_group;
+#endif
 };
 
 struct kvm_vfio {
@@ -97,12 +100,12 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
 					     struct kvm_vfio_group *kvg)
 {
-	struct iommu_group *grp = kvm_vfio_file_iommu_group(kvg->file);
-
-	if (WARN_ON_ONCE(!grp))
+	if (WARN_ON_ONCE(!kvg->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, grp);
+	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
+	iommu_group_put(kvg->iommu_group);
+	kvg->iommu_group = NULL;
 }
 #endif
 
@@ -252,19 +255,19 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 	mutex_lock(&kv->lock);
 
 	list_for_each_entry(kvg, &kv->group_list, node) {
-		struct iommu_group *grp;
-
 		if (kvg->file != f.file)
 			continue;
 
-		grp = kvm_vfio_file_iommu_group(kvg->file);
-		if (WARN_ON_ONCE(!grp)) {
-			ret = -EIO;
-			goto err_fdput;
+		if (!kvg->iommu_group) {
+			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
+			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+				ret = -EIO;
+				goto err_fdput;
+			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       grp);
+						       kvg->iommu_group);
 		break;
 	}
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] vfio: Make the group FD disassociate from the iommu_group
  2022-10-06 12:40 [PATCH 0/3] Allow the group FD to remain open when unplugging a device Jason Gunthorpe
  2022-10-06 12:40 ` [PATCH 1/3] vfio: Add vfio_file_is_group() Jason Gunthorpe
  2022-10-06 12:40 ` [PATCH 2/3] vfio: Hold a reference to the iommu_group in kvm for SPAPR Jason Gunthorpe
@ 2022-10-06 12:40 ` Jason Gunthorpe
  2022-10-06 19:53 ` [PATCH 0/3] Allow the group FD to remain open when unplugging a device Alex Williamson
  3 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 12:40 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini
  Cc: Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Matthew Rosato,
	Yi Liu

Allow the vfio_group struct to exist with a NULL iommu_group pointer. When
the pointer is NULL the vfio_group users promise not to touch the
iommu_group. This allows a driver to be hot unplugged while userspace is
keeping the group FD open.

Remove all the code waiting for the group FD to close.

This fixes a userspace regression where we learned that virtnodedevd
leaves a group FD open even though the /dev/ node for it has been deleted
and all the drivers for it unplugged.

Fixes: ca5f21b25749 ("vfio: Follow a strict lifetime for struct iommu_group")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.h      |  1 -
 drivers/vfio/vfio_main.c | 74 ++++++++++++++++++++++++----------------
 2 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4a1bac1359a952..bcad54bbab08c4 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -59,7 +59,6 @@ struct vfio_group {
 	struct mutex			group_lock;
 	struct kvm			*kvm;
 	struct file			*opened_file;
-	struct swait_queue_head		opened_file_wait;
 	struct blocking_notifier_head	notifier;
 };
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 233349867fb36a..360742cafe3a1f 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -133,6 +133,10 @@ __vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 {
 	struct vfio_group *group;
 
+	/*
+	 * group->iommu_group from the vfio.group_list cannot be NULL
+	 * under the vfio.group_lock.
+	 */
 	list_for_each_entry(group, &vfio.group_list, vfio_next) {
 		if (group->iommu_group == iommu_group) {
 			refcount_inc(&group->drivers);
@@ -159,7 +163,7 @@ static void vfio_group_release(struct device *dev)
 
 	mutex_destroy(&group->device_lock);
 	mutex_destroy(&group->group_lock);
-	iommu_group_put(group->iommu_group);
+	WARN_ON(group->iommu_group);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
 }
@@ -189,7 +193,6 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 
 	refcount_set(&group->drivers, 1);
 	mutex_init(&group->group_lock);
-	init_swait_queue_head(&group->opened_file_wait);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
 	group->iommu_group = iommu_group;
@@ -248,6 +251,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 static void vfio_device_remove_group(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
+	struct iommu_group *iommu_group;
 
 	if (group->type == VFIO_NO_IOMMU || group->type == VFIO_EMULATED_IOMMU)
 		iommu_group_remove_device(device->dev);
@@ -265,31 +269,29 @@ static void vfio_device_remove_group(struct vfio_device *device)
 	 */
 	cdev_device_del(&group->cdev, &group->dev);
 
-	/*
-	 * Before we allow the last driver in the group to be unplugged the
-	 * group must be sanitized so nothing else is or can reference it. This
-	 * is because the group->iommu_group pointer should only be used so long
-	 * as a device driver is attached to a device in the group.
-	 */
-	while (group->opened_file) {
-		mutex_unlock(&vfio.group_lock);
-		swait_event_idle_exclusive(group->opened_file_wait,
-					   !group->opened_file);
-		mutex_lock(&vfio.group_lock);
-	}
-	mutex_unlock(&vfio.group_lock);
-
+	mutex_lock(&group->group_lock);
 	/*
 	 * These data structures all have paired operations that can only be
-	 * undone when the caller holds a live reference on the group. Since all
-	 * pairs must be undone these WARN_ON's indicate some caller did not
+	 * undone when the caller holds a live reference on the device. Since
+	 * all pairs must be undone these WARN_ON's indicate some caller did not
 	 * properly hold the group reference.
 	 */
 	WARN_ON(!list_empty(&group->device_list));
-	WARN_ON(group->container || group->container_users);
 	WARN_ON(group->notifier.head);
+
+	/*
+	 * Revoke all users of group->iommu_group. At this point we know there
+	 * are no devices active because we are unplugging the last one. Setting
+	 * iommu_group to NULL blocks all new users.
+	 */
+	if (group->container)
+		vfio_group_detach_container(group);
+	iommu_group = group->iommu_group;
 	group->iommu_group = NULL;
+	mutex_unlock(&group->group_lock);
+	mutex_unlock(&vfio.group_lock);
 
+	iommu_group_put(iommu_group);
 	put_device(&group->dev);
 }
 
@@ -531,6 +533,10 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 	existing_device = vfio_group_get_device(group, device->dev);
 	if (existing_device) {
+		/*
+		 * group->iommu_group is non-NULL because we hold the drivers
+		 * refcount.
+		 */
 		dev_WARN(device->dev, "Device already exists on group %d\n",
 			 iommu_group_id(group->iommu_group));
 		vfio_device_put_registration(existing_device);
@@ -702,6 +708,11 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 		ret = -EINVAL;
 		goto out_unlock;
 	}
+	if (!group->iommu_group) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
 	container = vfio_container_from_file(f.file);
 	ret = -EINVAL;
 	if (container) {
@@ -862,6 +873,11 @@ static int vfio_group_ioctl_get_status(struct vfio_group *group,
 	status.flags = 0;
 
 	mutex_lock(&group->group_lock);
+	if (!group->iommu_group) {
+		mutex_unlock(&group->group_lock);
+		return -ENODEV;
+	}
+
 	if (group->container)
 		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
 				VFIO_GROUP_FLAGS_VIABLE;
@@ -938,17 +954,8 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	filep->private_data = NULL;
 
 	mutex_lock(&group->group_lock);
-	/*
-	 * Device FDs hold a group file reference, therefore the group release
-	 * is only called when there are no open devices.
-	 */
-	WARN_ON(group->notifier.head);
-	if (group->container)
-		vfio_group_detach_container(group);
 	group->opened_file = NULL;
 	mutex_unlock(&group->group_lock);
-	swake_up_one(&group->opened_file_wait);
-
 	return 0;
 }
 
@@ -1559,14 +1566,21 @@ static const struct file_operations vfio_device_fops = {
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
 	struct vfio_group *group = file->private_data;
+	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
 	if (file->f_op != &vfio_group_fops)
 		return NULL;
-	iommu_group_ref_get(group->iommu_group);
-	return group->iommu_group;
+
+	mutex_lock(&group->group_lock);
+	if (group->iommu_group) {
+		iommu_group = group->iommu_group;
+		iommu_group_ref_get(iommu_group);
+	}
+	mutex_unlock(&group->group_lock);
+	return iommu_group;
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] vfio: Add vfio_file_is_group()
  2022-10-06 12:40 ` [PATCH 1/3] vfio: Add vfio_file_is_group() Jason Gunthorpe
@ 2022-10-06 18:11   ` Alex Williamson
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2022-10-06 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Christian Borntraeger, Qian Cai, Eric Farman, Joerg Roedel,
	Marek Szyprowski, Matthew Rosato, Yi Liu

On Thu,  6 Oct 2022 09:40:36 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> This replaces uses of vfio_file_iommu_group() which were only detecting if
> the file is a VFIO file with no interest in the actual group.
> 
> The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR
> stuff. It passes the iommu_group into the arch code through kvm for some
> reason.
> 
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Tested-by: Eric Farman <farman@linux.ibm.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>  drivers/vfio/vfio_main.c         | 14 ++++++++++++++
>  include/linux/vfio.h             |  1 +
>  virt/kvm/vfio.c                  | 20 ++++++++++++++++++--
>  4 files changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 59a28251bb0b97..badc9d828cac20 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1313,7 +1313,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>  		}
>  
>  		/* Ensure the FD is a vfio group FD.*/
> -		if (!vfio_file_iommu_group(file)) {
> +		if (!vfio_file_is_group(file)) {
>  			fput(file);
>  			ret = -EINVAL;
>  			break;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 9207e6c0e3cb26..7866849be56ef6 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1553,17 +1553,31 @@ static const struct file_operations vfio_device_fops = {
>   * @file: VFIO group file
>   *
>   * The returned iommu_group is valid as long as a ref is held on the file.
> + * This function is deprecated, only the SPAPR path in kvm should call it.
>   */
>  struct iommu_group *vfio_file_iommu_group(struct file *file)
>  {
>  	struct vfio_group *group = file->private_data;
>  
> +	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
> +		return NULL;
> +
>  	if (file->f_op != &vfio_group_fops)

Nit, with the function below, shouldn't the line above become:

	if (!vfio_file_is_group(file))

Thanks,
Alex

>  		return NULL;
>  	return group->iommu_group;
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
>  
> +/**
> + * vfio_file_is_group - True if the file is usable with VFIO aPIS
> + * @file: VFIO group file
> + */
> +bool vfio_file_is_group(struct file *file)
> +{
> +	return file->f_op == &vfio_group_fops;
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_is_group);
> +
>  /**
>   * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
>   *        is always CPU cache coherent
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ee399a768070d0..e7cebeb875dd1a 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -199,6 +199,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
>   * External user API
>   */
>  struct iommu_group *vfio_file_iommu_group(struct file *file);
> +bool vfio_file_is_group(struct file *file);
>  bool vfio_file_enforced_coherent(struct file *file);
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
>  bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index ce1b01d02c5197..54aec3b0559c70 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -61,6 +61,23 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
>  	return ret;
>  }
>  
> +static bool kvm_vfio_file_is_group(struct file *file)
> +{
> +	bool (*fn)(struct file *file);
> +	bool ret;
> +
> +	fn = symbol_get(vfio_file_is_group);
> +	if (!fn)
> +		return false;
> +
> +	ret = fn(file);
> +
> +	symbol_put(vfio_file_is_group);
> +
> +	return ret;
> +}
> +
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>  static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
>  {
>  	struct iommu_group *(*fn)(struct file *file);
> @@ -77,7 +94,6 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
>  	return ret;
>  }
>  
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
>  static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
>  					     struct kvm_vfio_group *kvg)
>  {
> @@ -136,7 +152,7 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
>  		return -EBADF;
>  
>  	/* Ensure the FD is a vfio group FD.*/
> -	if (!kvm_vfio_file_iommu_group(filp)) {
> +	if (!kvm_vfio_file_is_group(filp)) {
>  		ret = -EINVAL;
>  		goto err_fput;
>  	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 12:40 [PATCH 0/3] Allow the group FD to remain open when unplugging a device Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2022-10-06 12:40 ` [PATCH 3/3] vfio: Make the group FD disassociate from the iommu_group Jason Gunthorpe
@ 2022-10-06 19:53 ` Alex Williamson
  2022-10-06 22:42   ` Jason Gunthorpe
  3 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2022-10-06 19:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Christian Borntraeger, Qian Cai, Eric Farman, Joerg Roedel,
	Marek Szyprowski, Matthew Rosato, Yi Liu

On Thu,  6 Oct 2022 09:40:35 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Testing has shown that virtnodedevd will leave the group FD open for long
> periods, even after all the cdevs have been destroyed. This blocks
> destruction of the VFIO device and is undesirable.
> 
> That approach was selected to accomodate SPAPR which has an broken
> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
> realizing that it doesn't use the iommu core at all, so rules about
> iommu_group lifetime do not apply to it.
> 
> Giving the KVM code its own kref on the iommu_group allows the VFIO core
> code to release its iommu_group reference earlier and we can remove the
> sleep that only existed for SPAPR.
> 
> Jason Gunthorpe (3):
>   vfio: Add vfio_file_is_group()
>   vfio: Hold a reference to the iommu_group in kvm for SPAPR
>   vfio: Make the group FD disassociate from the iommu_group
> 
>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>  drivers/vfio/vfio.h              |  1 -
>  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>  include/linux/vfio.h             |  1 +
>  virt/kvm/vfio.c                  | 45 +++++++++++-----
>  5 files changed, 94 insertions(+), 45 deletions(-)

Containers aren't getting cleaned up with this series, starting and
shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
device, and making use of hugepages, /proc/meminfo shows the hugepages
are not released on VM shutdown and I'm unable to subsequently restart
the VM. Thanks,

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 19:53 ` [PATCH 0/3] Allow the group FD to remain open when unplugging a device Alex Williamson
@ 2022-10-06 22:42   ` Jason Gunthorpe
  2022-10-06 23:28     ` Matthew Rosato
  2022-10-07 11:17     ` Christian Borntraeger
  0 siblings, 2 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 22:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Christian Borntraeger, Qian Cai, Eric Farman, Joerg Roedel,
	Marek Szyprowski, Matthew Rosato, Yi Liu

On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
> On Thu,  6 Oct 2022 09:40:35 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > Testing has shown that virtnodedevd will leave the group FD open for long
> > periods, even after all the cdevs have been destroyed. This blocks
> > destruction of the VFIO device and is undesirable.
> > 
> > That approach was selected to accomodate SPAPR which has an broken
> > lifecyle model for the iommu_group. However, we can accomodate SPAPR by
> > realizing that it doesn't use the iommu core at all, so rules about
> > iommu_group lifetime do not apply to it.
> > 
> > Giving the KVM code its own kref on the iommu_group allows the VFIO core
> > code to release its iommu_group reference earlier and we can remove the
> > sleep that only existed for SPAPR.
> > 
> > Jason Gunthorpe (3):
> >   vfio: Add vfio_file_is_group()
> >   vfio: Hold a reference to the iommu_group in kvm for SPAPR
> >   vfio: Make the group FD disassociate from the iommu_group
> > 
> >  drivers/vfio/pci/vfio_pci_core.c |  2 +-
> >  drivers/vfio/vfio.h              |  1 -
> >  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
> >  include/linux/vfio.h             |  1 +
> >  virt/kvm/vfio.c                  | 45 +++++++++++-----
> >  5 files changed, 94 insertions(+), 45 deletions(-)
> 
> Containers aren't getting cleaned up with this series, starting and
> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
> device, and making use of hugepages, /proc/meminfo shows the hugepages
> are not released on VM shutdown and I'm unable to subsequently restart
> the VM. Thanks,

Oh, I'm surprised the s390 testing didn't hit this!!

This hunk should remain since not all cases are closures due to device
hot unplug

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f9cb734d3991b3..62aba3a128fb8d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -954,6 +954,13 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	filep->private_data = NULL;
 
 	mutex_lock(&group->group_lock);
+	/*
+	 * Device FDs hold a group file reference, therefore the group release
+	 * is only called when there are no open devices.
+	 */
+	WARN_ON(group->notifier.head);
+	if (group->container)
+		vfio_group_detach_container(group);
 	group->opened_file = NULL;
 	mutex_unlock(&group->group_lock);
 	return 0;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 22:42   ` Jason Gunthorpe
@ 2022-10-06 23:28     ` Matthew Rosato
  2022-10-07  1:46       ` Matthew Rosato
  2022-10-07 13:37       ` Jason Gunthorpe
  2022-10-07 11:17     ` Christian Borntraeger
  1 sibling, 2 replies; 14+ messages in thread
From: Matthew Rosato @ 2022-10-06 23:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Christian Borntraeger, Qian Cai, Eric Farman, Joerg Roedel,
	Marek Szyprowski, Yi Liu

On 10/6/22 6:42 PM, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
>> On Thu,  6 Oct 2022 09:40:35 -0300
>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>>> Testing has shown that virtnodedevd will leave the group FD open for long
>>> periods, even after all the cdevs have been destroyed. This blocks
>>> destruction of the VFIO device and is undesirable.
>>>
>>> That approach was selected to accomodate SPAPR which has an broken
>>> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
>>> realizing that it doesn't use the iommu core at all, so rules about
>>> iommu_group lifetime do not apply to it.
>>>
>>> Giving the KVM code its own kref on the iommu_group allows the VFIO core
>>> code to release its iommu_group reference earlier and we can remove the
>>> sleep that only existed for SPAPR.
>>>
>>> Jason Gunthorpe (3):
>>>   vfio: Add vfio_file_is_group()
>>>   vfio: Hold a reference to the iommu_group in kvm for SPAPR
>>>   vfio: Make the group FD disassociate from the iommu_group
>>>
>>>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>>>  drivers/vfio/vfio.h              |  1 -
>>>  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>>>  include/linux/vfio.h             |  1 +
>>>  virt/kvm/vfio.c                  | 45 +++++++++++-----
>>>  5 files changed, 94 insertions(+), 45 deletions(-)
>>
>> Containers aren't getting cleaned up with this series, starting and
>> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
>> device, and making use of hugepages, /proc/meminfo shows the hugepages
>> are not released on VM shutdown and I'm unable to subsequently restart
>> the VM. Thanks,
> 
> Oh, I'm surprised the s390 testing didn't hit this!!

Huh, me too, at least eventually - I think it's because we aren't pinning everything upfront but rather on-demand so the missing the type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.  I definitely did multiple VM (re)starts and hot (un)plugs.  But while my test workloads did some I/O, the long-running one was focused on the plug/unplug scenarios to recreate the initial issue so the I/O (and thus pinning) done would have been minimal.

-ap and -ccw also don't pin everything upfront (and I did far less testing with those).

Ugh.  Moving forward, might be worth seeing how I can loop in some non-s390-specific vfio testing into my routine.

> 
> This hunk should remain since not all cases are closures due to device
> hot unplug
> 
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f9cb734d3991b3..62aba3a128fb8d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -954,6 +954,13 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
>  	filep->private_data = NULL;
>  
>  	mutex_lock(&group->group_lock);
> +	/*
> +	 * Device FDs hold a group file reference, therefore the group release
> +	 * is only called when there are no open devices.
> +	 */
> +	WARN_ON(group->notifier.head);
> +	if (group->container)
> +		vfio_group_detach_container(group);
>  	group->opened_file = NULL;
>  	mutex_unlock(&group->group_lock);
>  	return 0;

Anyway, FWIW, I folded this in and re-ran a brief series of -pci, -ccw and -ap tests on s390 and things still look good.  For completeness I'll start some longer-running pci tests next but I expect this will still be fine as well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 23:28     ` Matthew Rosato
@ 2022-10-07  1:46       ` Matthew Rosato
  2022-10-07 13:37       ` Jason Gunthorpe
  1 sibling, 0 replies; 14+ messages in thread
From: Matthew Rosato @ 2022-10-07  1:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Christian Borntraeger, Qian Cai, Eric Farman, Joerg Roedel,
	Marek Szyprowski, Yi Liu

On 10/6/22 7:28 PM, Matthew Rosato wrote:
> On 10/6/22 6:42 PM, Jason Gunthorpe wrote:
>> On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
>>> On Thu,  6 Oct 2022 09:40:35 -0300
>>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>
>>>> Testing has shown that virtnodedevd will leave the group FD open for long
>>>> periods, even after all the cdevs have been destroyed. This blocks
>>>> destruction of the VFIO device and is undesirable.
>>>>
>>>> That approach was selected to accomodate SPAPR which has an broken
>>>> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
>>>> realizing that it doesn't use the iommu core at all, so rules about
>>>> iommu_group lifetime do not apply to it.
>>>>
>>>> Giving the KVM code its own kref on the iommu_group allows the VFIO core
>>>> code to release its iommu_group reference earlier and we can remove the
>>>> sleep that only existed for SPAPR.
>>>>
>>>> Jason Gunthorpe (3):
>>>>   vfio: Add vfio_file_is_group()
>>>>   vfio: Hold a reference to the iommu_group in kvm for SPAPR
>>>>   vfio: Make the group FD disassociate from the iommu_group
>>>>
>>>>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>>>>  drivers/vfio/vfio.h              |  1 -
>>>>  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>>>>  include/linux/vfio.h             |  1 +
>>>>  virt/kvm/vfio.c                  | 45 +++++++++++-----
>>>>  5 files changed, 94 insertions(+), 45 deletions(-)
>>>
>>> Containers aren't getting cleaned up with this series, starting and
>>> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
>>> device, and making use of hugepages, /proc/meminfo shows the hugepages
>>> are not released on VM shutdown and I'm unable to subsequently restart
>>> the VM. Thanks,
>>
>> Oh, I'm surprised the s390 testing didn't hit this!!
> 
> Huh, me too, at least eventually - I think it's because we aren't pinning everything upfront but rather on-demand so the missing the type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.  I definitely did multiple VM (re)starts and hot (un)plugs.  But while my test workloads did some I/O, the long-running one was focused on the plug/unplug scenarios to recreate the initial issue so the I/O (and thus pinning) done would have been minimal.
> 
> -ap and -ccw also don't pin everything upfront (and I did far less testing with those).
> 
> Ugh.  Moving forward, might be worth seeing how I can loop in some non-s390-specific vfio testing into my routine.
> 
>>
>> This hunk should remain since not all cases are closures due to device
>> hot unplug
>>
>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>> index f9cb734d3991b3..62aba3a128fb8d 100644
>> --- a/drivers/vfio/vfio_main.c
>> +++ b/drivers/vfio/vfio_main.c
>> @@ -954,6 +954,13 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
>>  	filep->private_data = NULL;
>>  
>>  	mutex_lock(&group->group_lock);
>> +	/*
>> +	 * Device FDs hold a group file reference, therefore the group release
>> +	 * is only called when there are no open devices.
>> +	 */
>> +	WARN_ON(group->notifier.head);
>> +	if (group->container)
>> +		vfio_group_detach_container(group);
>>  	group->opened_file = NULL;
>>  	mutex_unlock(&group->group_lock);
>>  	return 0;
> 
> Anyway, FWIW, I folded this in and re-ran a brief series of -pci, -ccw and -ap tests on s390 and things still look good.  For completeness I'll start some longer-running pci tests next but I expect this will still be fine as well.

Further s390 vfio-pci testing still looks good with this change too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 22:42   ` Jason Gunthorpe
  2022-10-06 23:28     ` Matthew Rosato
@ 2022-10-07 11:17     ` Christian Borntraeger
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2022-10-07 11:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Cornelia Huck, kvm, Paolo Bonzini, Christian Borntraeger,
	Qian Cai, Eric Farman, Joerg Roedel, Marek Szyprowski,
	Matthew Rosato, Yi Liu



Am 07.10.22 um 00:42 schrieb Jason Gunthorpe:
> On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
>> On Thu,  6 Oct 2022 09:40:35 -0300
>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>>> Testing has shown that virtnodedevd will leave the group FD open for long
>>> periods, even after all the cdevs have been destroyed. This blocks
>>> destruction of the VFIO device and is undesirable.
>>>
>>> That approach was selected to accomodate SPAPR which has an broken
>>> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
>>> realizing that it doesn't use the iommu core at all, so rules about
>>> iommu_group lifetime do not apply to it.
>>>
>>> Giving the KVM code its own kref on the iommu_group allows the VFIO core
>>> code to release its iommu_group reference earlier and we can remove the
>>> sleep that only existed for SPAPR.
>>>
>>> Jason Gunthorpe (3):
>>>    vfio: Add vfio_file_is_group()
>>>    vfio: Hold a reference to the iommu_group in kvm for SPAPR
>>>    vfio: Make the group FD disassociate from the iommu_group
>>>
>>>   drivers/vfio/pci/vfio_pci_core.c |  2 +-
>>>   drivers/vfio/vfio.h              |  1 -
>>>   drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>>>   include/linux/vfio.h             |  1 +
>>>   virt/kvm/vfio.c                  | 45 +++++++++++-----
>>>   5 files changed, 94 insertions(+), 45 deletions(-)
>>
>> Containers aren't getting cleaned up with this series, starting and
>> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
>> device, and making use of hugepages, /proc/meminfo shows the hugepages
>> are not released on VM shutdown and I'm unable to subsequently restart
>> the VM. Thanks,
> 
> Oh, I'm surprised the s390 testing didn't hit this!!

I guess its because that most of our CI testcases create a new ephemeral
guest for each testcase. We do test reboot but not shutdown and restart.
Will have a look if that can be improved.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-06 23:28     ` Matthew Rosato
  2022-10-07  1:46       ` Matthew Rosato
@ 2022-10-07 13:37       ` Jason Gunthorpe
  2022-10-07 14:37         ` Matthew Rosato
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-07 13:37 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini,
	Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Yi Liu

On Thu, Oct 06, 2022 at 07:28:53PM -0400, Matthew Rosato wrote:

> > Oh, I'm surprised the s390 testing didn't hit this!!
> 
> Huh, me too, at least eventually - I think it's because we aren't
> pinning everything upfront but rather on-demand so the missing the
> type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.
> I definitely did multiple VM (re)starts and hot (un)plugs.  But
> while my test workloads did some I/O, the long-running one was
> focused on the plug/unplug scenarios to recreate the initial issue
> so the I/O (and thus pinning) done would have been minimal.

That explains ccw/ap a bit but for PCI the iommu ownership wasn't
released so it becomes impossible to re-attach a container to the
group. eg a 2nd VM can never be started

Ah well, thanks!

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-07 13:37       ` Jason Gunthorpe
@ 2022-10-07 14:37         ` Matthew Rosato
  2022-10-07 14:39           ` Jason Gunthorpe
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Rosato @ 2022-10-07 14:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini,
	Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Yi Liu

On 10/7/22 9:37 AM, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 07:28:53PM -0400, Matthew Rosato wrote:
> 
>>> Oh, I'm surprised the s390 testing didn't hit this!!
>>
>> Huh, me too, at least eventually - I think it's because we aren't
>> pinning everything upfront but rather on-demand so the missing the
>> type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.
>> I definitely did multiple VM (re)starts and hot (un)plugs.  But
>> while my test workloads did some I/O, the long-running one was
>> focused on the plug/unplug scenarios to recreate the initial issue
>> so the I/O (and thus pinning) done would have been minimal.
> 
> That explains ccw/ap a bit but for PCI the iommu ownership wasn't
> released so it becomes impossible to re-attach a container to the
> group. eg a 2nd VM can never be started
> 
> Ah well, thanks!
> 
> Jason

Well, this bugged me enough that I traced the v1 series without fixup and vfio-pci on s390 was OK because it was still calling detach_container on vm shutdown via this chain:

vfio_pci_remove
 vfio_pci_core_unregister_device
  vfio_unregister_group_dev
   vfio_device_remove_group
    vfio_group_detach_container

I'd guess non-s390 vfio-pci would do the same.  Alex also had the mtty mdev, maybe that's relevant.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-07 14:37         ` Matthew Rosato
@ 2022-10-07 14:39           ` Jason Gunthorpe
  2022-10-07 14:46             ` Matthew Rosato
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2022-10-07 14:39 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini,
	Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Yi Liu

On Fri, Oct 07, 2022 at 10:37:11AM -0400, Matthew Rosato wrote:
> On 10/7/22 9:37 AM, Jason Gunthorpe wrote:
> > On Thu, Oct 06, 2022 at 07:28:53PM -0400, Matthew Rosato wrote:
> > 
> >>> Oh, I'm surprised the s390 testing didn't hit this!!
> >>
> >> Huh, me too, at least eventually - I think it's because we aren't
> >> pinning everything upfront but rather on-demand so the missing the
> >> type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.
> >> I definitely did multiple VM (re)starts and hot (un)plugs.  But
> >> while my test workloads did some I/O, the long-running one was
> >> focused on the plug/unplug scenarios to recreate the initial issue
> >> so the I/O (and thus pinning) done would have been minimal.
> > 
> > That explains ccw/ap a bit but for PCI the iommu ownership wasn't
> > released so it becomes impossible to re-attach a container to the
> > group. eg a 2nd VM can never be started
> > 
> > Ah well, thanks!
> > 
> > Jason
> 
> Well, this bugged me enough that I traced the v1 series without fixup and vfio-pci on s390 was OK because it was still calling detach_container on vm shutdown via this chain:
> 
> vfio_pci_remove
>  vfio_pci_core_unregister_device
>   vfio_unregister_group_dev
>    vfio_device_remove_group
>     vfio_group_detach_container
> 
> I'd guess non-s390 vfio-pci would do the same.  Alex also had the mtty mdev, maybe that's relevant.

As long as you are unplugging a driver the v1 series would work. The
failure mode is when you don't unplug the driver and just run a VM
twice in a row.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device
  2022-10-07 14:39           ` Jason Gunthorpe
@ 2022-10-07 14:46             ` Matthew Rosato
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Rosato @ 2022-10-07 14:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Paolo Bonzini,
	Christian Borntraeger, Christian Borntraeger, Qian Cai,
	Eric Farman, Joerg Roedel, Marek Szyprowski, Yi Liu

On 10/7/22 10:39 AM, Jason Gunthorpe wrote:
> On Fri, Oct 07, 2022 at 10:37:11AM -0400, Matthew Rosato wrote:
>> On 10/7/22 9:37 AM, Jason Gunthorpe wrote:
>>> On Thu, Oct 06, 2022 at 07:28:53PM -0400, Matthew Rosato wrote:
>>>
>>>>> Oh, I'm surprised the s390 testing didn't hit this!!
>>>>
>>>> Huh, me too, at least eventually - I think it's because we aren't
>>>> pinning everything upfront but rather on-demand so the missing the
>>>> type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.
>>>> I definitely did multiple VM (re)starts and hot (un)plugs.  But
>>>> while my test workloads did some I/O, the long-running one was
>>>> focused on the plug/unplug scenarios to recreate the initial issue
>>>> so the I/O (and thus pinning) done would have been minimal.
>>>
>>> That explains ccw/ap a bit but for PCI the iommu ownership wasn't
>>> released so it becomes impossible to re-attach a container to the
>>> group. eg a 2nd VM can never be started
>>>
>>> Ah well, thanks!
>>>
>>> Jason
>>
>> Well, this bugged me enough that I traced the v1 series without fixup and vfio-pci on s390 was OK because it was still calling detach_container on vm shutdown via this chain:
>>
>> vfio_pci_remove
>>  vfio_pci_core_unregister_device
>>   vfio_unregister_group_dev
>>    vfio_device_remove_group
>>     vfio_group_detach_container
>>
>> I'd guess non-s390 vfio-pci would do the same.  Alex also had the mtty mdev, maybe that's relevant.
> 
> As long as you are unplugging a driver the v1 series would work. The
> failure mode is when you don't unplug the driver and just run a VM
> twice in a row.
> 
> Jason

Oh, duh - And yep all of my tests are using managed libvirt so its unbinding from vfio-pci back to the default host driver on VM shutdown.

OK, if I force the point and leave vfio-pci bound the 2nd guest boot indeed fails setting up the container with unmodified v1.  I'll try again with the new v2 now


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-07 14:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 12:40 [PATCH 0/3] Allow the group FD to remain open when unplugging a device Jason Gunthorpe
2022-10-06 12:40 ` [PATCH 1/3] vfio: Add vfio_file_is_group() Jason Gunthorpe
2022-10-06 18:11   ` Alex Williamson
2022-10-06 12:40 ` [PATCH 2/3] vfio: Hold a reference to the iommu_group in kvm for SPAPR Jason Gunthorpe
2022-10-06 12:40 ` [PATCH 3/3] vfio: Make the group FD disassociate from the iommu_group Jason Gunthorpe
2022-10-06 19:53 ` [PATCH 0/3] Allow the group FD to remain open when unplugging a device Alex Williamson
2022-10-06 22:42   ` Jason Gunthorpe
2022-10-06 23:28     ` Matthew Rosato
2022-10-07  1:46       ` Matthew Rosato
2022-10-07 13:37       ` Jason Gunthorpe
2022-10-07 14:37         ` Matthew Rosato
2022-10-07 14:39           ` Jason Gunthorpe
2022-10-07 14:46             ` Matthew Rosato
2022-10-07 11:17     ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).