iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, Will Deacon <will@kernel.org>,
	Qian Cai <cai@lca.pw>, Joerg Roedel <jroedel@suse.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>
Subject: Re: [PATCH 4/4] iommu: Fix ordering of iommu_release_device()
Date: Fri, 9 Sep 2022 10:25:42 -0300	[thread overview]
Message-ID: <Yxs+1s+MPENLTUpG@nvidia.com> (raw)
In-Reply-To: <38bac59a-808d-5e91-227a-a3a06633c091@arm.com>

On Fri, Sep 09, 2022 at 10:05:58AM +0100, Robin Murphy wrote:
> On 2022-09-08 22:43, Jason Gunthorpe wrote:
> > On Thu, Sep 08, 2022 at 10:27:06PM +0100, Robin Murphy wrote:
> > 
> > > Oh, because s390 is using iommu_get_domain_for_dev() in its release_device
> > > callback, which needs to dereference the group to work, and the current
> > > domain may also be a non-default one which we can't prevent from
> > > disappearing racily, that was why :(
> > 
> > Hum, the issue there is the use of device->iommu_group - but that just
> > means I didn't split properly. How about this incremental:
> 
> That did cross my mind, but it's a bit grim.

Actually, also in my morning, I think it may not even be necessary.

Keep in mind the start of the series fixes VFIO.

The bug that S390 is trying to fix is that VFIO didn't put back the
group ownership, it just left its own iommu_domain attached and called
release().

But now, at least for single device groups, VFIO will put owenership
back and zdev->s390_domain == NULL when we get to release_device()

> That then only leaves the issue that that domain may still become
> invalid at any point after the group mutex has been dropped.

So that is this race:

        CPU0                         CPU1 
   iommu_release_device(a)
      __iommu_group_remove_device(a)
			         iommu_device_use_default_domain(b)
                                 iommu_domain_free(domain)
                                 iommu_release_device(b)
                                      ops->release_device(b)
      ops->release_device(a) 
        // Boom, a is still attached to domain :(

I can't think of how to solve this other than holding the group mutex
across release_device. See below.

> > And to your other question, the reason I split the function is because
> > I couldn't really say WTF iommu_group_remove_device() was supposed to
> > do. The __ version make ssense as part of the remove_device, due to
> > the sequencing with ops->release()
> > 
> > But the other one doesn't have that. So I want to put in a:
> > 
> >     WARN_ON(group->blocking_domain || group->default_domain);
> > 
> > Because calling it after those domains are allocated looks broken to
> > me.
> 
> I might be misunderstanding, but that sounds backwards - if a real device is
> being hotplugged out, we absolutely expect that to happen *after* its
> default domain has been set up.

See below for what I mean

iommu_group_remove_device() doesn't work as an API because it has no
way to tell the device to stop using the domain we are about to free.

So it should assert that there is no domain to worry about. For the
vfio and power case there is no domain because they don't use iommu
drivers

For FSL it does not use default domains so it will also be OK.

Also, I think FSL is broken and it should not be trying to remove the
"PCI controller" from a group. Every PCI device behind an IOMMU should
be linked to a group.

The only reason I can think someone would wanted to do this is because
they ran into trouble with the VFIO viability checks. But we have a
robust solution to that now that doesn't require abusing the group
like this.

Jason

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 780fb70715770d..cb83576b1877d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -90,6 +90,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
 				      const char *buf, size_t count);
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev);
+static void __iommu_group_remove_device_sysfs(struct iommu_group *group,
+					      struct group_device *device);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
 struct iommu_group_attribute iommu_group_attr_##_name =		\
@@ -330,18 +334,50 @@ int iommu_probe_device(struct device *dev)
 
 void iommu_release_device(struct device *dev)
 {
+	struct iommu_group *group = dev->iommu_group;
 	const struct iommu_ops *ops;
+	struct group_device *device;
 
 	if (!dev->iommu)
 		return;
 
 	iommu_device_unlink(dev->iommu->iommu_dev, dev);
 
+	mutex_lock(&group->mutex);
+	device = __iommu_group_remove_device(group, dev);
 	ops = dev_iommu_ops(dev);
+
+	/*
+	 * If the group has become empty then ownership must have been released,
+	 * and the current domain must be set back to NULL or the default
+	 * domain.
+	 */
+	if (list_empty(&group->devices))
+		WARN_ON(group->owner_cnt ||
+			group->domain != group->default_domain);
+
+	/*
+	 * release_device() must stop using any attached domain on the device.
+	 * If there are still other devices in the group they are not effected
+	 * by this callback.
+	 *
+	 * The IOMMU driver must set the device to either an identity or
+	 * blocking translation and stop using any domain pointer, as it is
+	 * going to be freed.
+	 */
 	if (ops->release_device)
 		ops->release_device(dev);
+	mutex_unlock(&group->mutex);
+
+	__iommu_group_remove_device_sysfs(group, device);
+
+	/*
+	 * This will eventually call iommu_group_release() which will free the
+	 * iommu_domains.
+	 */
+	dev->iommu_group = NULL;
+	kobject_put(group->devices_kobj);
 
-	iommu_group_remove_device(dev);
 	module_put(ops->owner);
 	dev_iommu_free(dev);
 }
@@ -939,44 +975,69 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_group_add_device);
 
-/**
- * iommu_group_remove_device - remove a device from it's current group
- * @dev: device to be removed
- *
- * This function is called by an iommu driver to remove the device from
- * it's current group.  This decrements the iommu group reference count.
- */
-void iommu_group_remove_device(struct device *dev)
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev)
 {
-	struct iommu_group *group = dev->iommu_group;
-	struct group_device *tmp_device, *device = NULL;
+	struct group_device *device;
+
+	lockdep_assert_held(&group->mutex);
 
 	if (!group)
-		return;
+		return NULL;
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	mutex_lock(&group->mutex);
-	list_for_each_entry(tmp_device, &group->devices, list) {
-		if (tmp_device->dev == dev) {
-			device = tmp_device;
+	list_for_each_entry(device, &group->devices, list) {
+		if (device->dev == dev) {
 			list_del(&device->list);
-			break;
+			return device;
 		}
 	}
-	mutex_unlock(&group->mutex);
+	return NULL;
+}
 
+static void __iommu_group_remove_device_sysfs(struct iommu_group *group,
+					      struct group_device *device)
+{
 	if (!device)
 		return;
 
 	sysfs_remove_link(group->devices_kobj, device->name);
-	sysfs_remove_link(&dev->kobj, "iommu_group");
+	sysfs_remove_link(&device->dev->kobj, "iommu_group");
 
-	trace_remove_device_from_group(group->id, dev);
+	trace_remove_device_from_group(group->id, device->dev);
 
 	kfree(device->name);
 	kfree(device);
-	dev->iommu_group = NULL;
+}
+
+/**
+ * iommu_group_remove_device - remove a device from it's current group
+ * @dev: device to be removed
+ *
+ * This function is called by an iommu driver to remove the device from
+ * it's current group.  This decrements the iommu group reference count.
+ */
+void iommu_group_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+	struct group_device *device;
+
+	/*
+	 * Since we don't do ops->release_device() there is no way for the
+	 * driver to stop using any attached domain before we free it. If a
+	 * domain is attached while this function is called it will eventually
+	 * UAF.
+	 *
+	 * Thus it is only useful for cases like VFIO/SPAPR that don't use an
+	 * iommu driver, or for cases like FSL that don't use default domains.
+	 */
+	WARN_ON(group->domain);
+
+	mutex_lock(&group->mutex);
+	device = __iommu_group_remove_device(group, dev);
+	mutex_unlock(&group->mutex);
+	__iommu_group_remove_device_sysfs(group, device);
 	kobject_put(group->devices_kobj);
 }
 EXPORT_SYMBOL_GPL(iommu_group_remove_device);

  reply	other threads:[~2022-09-09 13:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 18:44 [PATCH 0/4] Fix splats releated to using the iommu_group after destroying devices Jason Gunthorpe
2022-09-08 18:44 ` [PATCH 1/4] vfio: Simplify vfio_create_group() Jason Gunthorpe
2022-09-20 19:45   ` Matthew Rosato
2022-09-08 18:44 ` [PATCH 2/4] vfio: Move the sanity check of the group to vfio_create_group() Jason Gunthorpe
2022-09-22 19:10   ` Alex Williamson
2022-09-22 19:36     ` Jason Gunthorpe
2022-09-22 21:23       ` Alex Williamson
2022-09-22 23:12         ` Jason Gunthorpe
2022-09-08 18:45 ` [PATCH 3/4] vfio: Follow a strict lifetime for struct iommu_group * Jason Gunthorpe
2022-09-20 19:32   ` Matthew Rosato
2022-09-08 18:45 ` [PATCH 4/4] iommu: Fix ordering of iommu_release_device() Jason Gunthorpe
2022-09-08 21:05   ` Robin Murphy
2022-09-08 21:27     ` Robin Murphy
2022-09-08 21:43       ` Jason Gunthorpe
2022-09-09  9:05         ` Robin Murphy
2022-09-09 13:25           ` Jason Gunthorpe [this message]
2022-09-09 17:57             ` Robin Murphy
2022-09-09 18:30               ` Jason Gunthorpe
2022-09-09 19:55                 ` Robin Murphy
2022-09-09 23:45                   ` Jason Gunthorpe
2022-09-12 11:13                     ` Robin Murphy
2022-09-22 16:56                       ` Jason Gunthorpe
2022-09-09 12:49 ` [PATCH 0/4] Fix splats releated to using the iommu_group after destroying devices Matthew Rosato
2022-09-09 16:24   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yxs+1s+MPENLTUpG@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cai@lca.pw \
    --cc=cohuck@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).