All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	llvm@lists.linux.dev, Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>, Tom Rix <trix@redhat.com>,
	Will Deacon <will@kernel.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Nicolin Chen <nicolinc@nvidia.com>
Subject: [PATCH v2 03/14] iommu: Make __iommu_group_set_domain() handle error unwind
Date: Wed, 29 Mar 2023 20:40:27 -0300	[thread overview]
Message-ID: <3-v2-cd32667d2ba6+70bd1-iommu_err_unwind_jgg@nvidia.com> (raw)
In-Reply-To: <0-v2-cd32667d2ba6+70bd1-iommu_err_unwind_jgg@nvidia.com>

Let's try to have a consistent and clear strategy for error handling
during domain attach failures.

There are two broad categories, the first is callers doing destruction and
trying to set the domain back to a previously good domain. These cases
cannot handle failure during destruction flows and must succeed, or at
least avoid a UAF on the current group->domain which is likely about to be
freed.

Many of the drivers are well behaved here and will not hit the WARN_ON's
or a UAF, but some are doing hypercalls/etc that can fail unpredictably
and don't meet the expectations.

The second case is attaching a domain for the first time in a failable
context, failure should restore the attachment back to group->domain using
the above unfailable operation.

Have __iommu_group_set_domain_internal() execute a common algorithm that
tries to achieve this, and in the worst case, would leave a device
"detached" or assigned to a global blocking domain. This relies on some
existing common driver behaviors where attach failure will also do detatch
and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever
fail.

Name the first case with __iommu_group_set_domain_nofail() to make it
clear.

Pull all the error handling and WARN_ON generation into
__iommu_group_set_domain_internal().

Avoid the obfuscating use of __iommu_group_for_each_dev() and be more
careful about what should happen during failures by only touching devices
we've already touched.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 130 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 108 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 501257bd02fc3c..b7cb2a56ebfe8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -99,8 +99,26 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
 static int __iommu_attach_group(struct iommu_domain *domain,
 				struct iommu_group *group);
+
+enum {
+	IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0,
+};
+
+static int __iommu_group_set_domain_internal(struct iommu_group *group,
+					     struct iommu_domain *new_domain,
+					     unsigned int flags);
 static int __iommu_group_set_domain(struct iommu_group *group,
-				    struct iommu_domain *new_domain);
+				    struct iommu_domain *new_domain)
+{
+	return __iommu_group_set_domain_internal(group, new_domain, 0);
+}
+static void __iommu_group_set_domain_nofail(struct iommu_group *group,
+					    struct iommu_domain *new_domain)
+{
+	WARN_ON(__iommu_group_set_domain_internal(
+		group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED));
+}
+
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
 					       struct device *dev);
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
@@ -2040,15 +2058,13 @@ EXPORT_SYMBOL_GPL(iommu_domain_free);
 static void __iommu_group_set_core_domain(struct iommu_group *group)
 {
 	struct iommu_domain *new_domain;
-	int ret;
 
 	if (group->owner)
 		new_domain = group->blocking_domain;
 	else
 		new_domain = group->default_domain;
 
-	ret = __iommu_group_set_domain(group, new_domain);
-	WARN(ret, "iommu driver failed to attach the default/blocking domain");
+	__iommu_group_set_domain_nofail(group, new_domain);
 }
 
 static int __iommu_attach_device(struct iommu_domain *domain,
@@ -2233,21 +2249,55 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_group);
 
-static int iommu_group_do_set_platform_dma(struct device *dev, void *data)
+static int __iommu_device_set_domain(struct iommu_group *group,
+				     struct device *dev,
+				     struct iommu_domain *new_domain,
+				     unsigned int flags)
 {
-	const struct iommu_ops *ops = dev_iommu_ops(dev);
-
-	if (!WARN_ON(!ops->set_platform_dma_ops))
-		ops->set_platform_dma_ops(dev);
+	int ret;
 
+	ret = __iommu_attach_device(new_domain, dev);
+	if (ret) {
+		/*
+		 * If we have a blocking domain then try to attach that in hopes
+		 * of avoiding a UAF. Modern drivers should implement blocking
+		 * domains as global statics that cannot fail.
+		 */
+		if ((flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) &&
+		    group->blocking_domain &&
+		    group->blocking_domain != new_domain)
+			__iommu_attach_device(group->blocking_domain, dev);
+		return ret;
+	}
 	return 0;
 }
 
-static int __iommu_group_set_domain(struct iommu_group *group,
-				    struct iommu_domain *new_domain)
+/*
+ * If 0 is returned the group's domain is new_domain. If an error is returned
+ * then the group's domain will be set back to the existing domain unless
+ * IOMMU_SET_DOMAIN_MUST_SUCCEED, otherwise an error is returned and the group's
+ * domains is left inconsistent. This is a driver bug to fail attach with a
+ * previously good domain. We try to avoid a kernel UAF because of this.
+ *
+ * IOMMU groups are really the natural working unit of the IOMMU, but the IOMMU
+ * API works on domains and devices.  Bridge that gap by iterating over the
+ * devices in a group.  Ideally we'd have a single device which represents the
+ * requestor ID of the group, but we also allow IOMMU drivers to create policy
+ * defined minimum sets, where the physical hardware may be able to distiguish
+ * members, but we wish to group them at a higher level (ex. untrusted
+ * multi-function PCI devices).  Thus we attach each device.
+ */
+static int __iommu_group_set_domain_internal(struct iommu_group *group,
+					     struct iommu_domain *new_domain,
+					     unsigned int flags)
 {
+	struct group_device *last_gdev;
+	struct group_device *gdev;
+	int result;
 	int ret;
 
+	lockdep_assert_held(&group->mutex);
+
 	if (group->domain == new_domain)
 		return 0;
 
@@ -2257,8 +2307,12 @@ static int __iommu_group_set_domain(struct iommu_group *group,
 	 * platform specific behavior.
 	 */
 	if (!new_domain) {
-		__iommu_group_for_each_dev(group, NULL,
-					   iommu_group_do_set_platform_dma);
+		for_each_group_device(group, gdev) {
+			const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
+
+			if (!WARN_ON(!ops->set_platform_dma_ops))
+				ops->set_platform_dma_ops(gdev->dev);
+		}
 		group->domain = NULL;
 		return 0;
 	}
@@ -2272,12 +2326,47 @@ static int __iommu_group_set_domain(struct iommu_group *group,
 	 * Note that this is called in error unwind paths, attaching to a
 	 * domain that has already been attached cannot fail.
 	 */
-	ret = __iommu_group_for_each_dev(group, new_domain,
-					 iommu_group_do_attach_device);
-	if (ret)
-		return ret;
+	result = 0;
+	for_each_group_device(group, gdev) {
+		ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
+						flags);
+		if (ret) {
+			result = ret;
+			/*
+			 * Keep trying the other devices in the group. If a
+			 * driver fails attach to an otherwise good domain, and
+			 * does not support blocking domains, it should at least
+			 * drop its reference on the current domain so we don't
+			 * UAF.
+			 */
+			if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
+				continue;
+			goto err_revert;
+		}
+	}
 	group->domain = new_domain;
-	return 0;
+	return result;
+
+err_revert:
+	last_gdev = gdev;
+	for_each_group_device(group, gdev) {
+		const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
+
+		/*
+		 * If set_platform_dma_ops is not present a NULL domain can
+		 * happen only for first probe, in which case we leave
+		 * group->domain as NULL and let release clean everything up.
+		 */
+		if (group->domain)
+			WARN_ON(__iommu_device_set_domain(
+				group, gdev->dev, group->domain,
+				IOMMU_SET_DOMAIN_MUST_SUCCEED));
+		else if (ops->set_platform_dma_ops)
+			ops->set_platform_dma_ops(gdev->dev);
+		if (gdev == last_gdev)
+			break;
+	}
+	return ret;
 }
 
 void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
@@ -3194,16 +3283,13 @@ EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner);
 
 static void __iommu_release_dma_ownership(struct iommu_group *group)
 {
-	int ret;
-
 	if (WARN_ON(!group->owner_cnt || !group->owner ||
 		    !xa_empty(&group->pasid_array)))
 		return;
 
 	group->owner_cnt = 0;
 	group->owner = NULL;
-	ret = __iommu_group_set_domain(group, group->default_domain);
-	WARN(ret, "iommu driver failed to attach the default domain");
+	__iommu_group_set_domain_nofail(group, group->default_domain);
 }
 
 /**
-- 
2.40.0


  parent reply	other threads:[~2023-03-29 23:40 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 23:40 [PATCH v2 00/14] Consolidate the error handling around device attachment Jason Gunthorpe
2023-03-29 23:40 ` [PATCH v2 01/14] iommu: Replace iommu_group_device_count() with list_count_nodes() Jason Gunthorpe
2023-03-30  6:22   ` Baolu Lu
2023-04-04  9:15   ` Tian, Kevin
2023-03-29 23:40 ` [PATCH v2 02/14] iommu: Add for_each_group_device() Jason Gunthorpe
2023-03-29 23:52   ` Miguel Ojeda
2023-03-30 14:28     ` Jason Gunthorpe
2023-05-09 13:12       ` Miguel Ojeda
2023-05-10  1:01         ` Jason Gunthorpe
2023-03-30  6:23   ` Baolu Lu
2023-04-04  9:16   ` Tian, Kevin
2023-03-29 23:40 ` Jason Gunthorpe [this message]
2023-03-30  6:23   ` [PATCH v2 03/14] iommu: Make __iommu_group_set_domain() handle error unwind Baolu Lu
2023-03-29 23:40 ` [PATCH v2 04/14] iommu: Use __iommu_group_set_domain() for __iommu_attach_group() Jason Gunthorpe
2023-03-30  6:23   ` Baolu Lu
2023-03-29 23:40 ` [PATCH v2 05/14] iommu: Use __iommu_group_set_domain() in iommu_change_dev_def_domain() Jason Gunthorpe
2023-03-30  6:24   ` Baolu Lu
2023-04-04  9:16   ` Tian, Kevin
2023-03-29 23:40 ` [PATCH v2 06/14] iommu: Replace __iommu_group_dma_first_attach() with set_domain Jason Gunthorpe
2023-03-30  6:24   ` Baolu Lu
2023-03-29 23:40 ` [PATCH v2 07/14] iommu: Make iommu_group_do_dma_first_attach() simpler Jason Gunthorpe
2023-03-30  6:42   ` Baolu Lu
2023-03-30 14:41     ` Jason Gunthorpe
2023-03-31  2:21       ` Baolu Lu
2023-04-04  9:17   ` Tian, Kevin
2023-03-29 23:40 ` [PATCH v2 08/14] iommu: Make iommu_group_do_dma_first_attach() work with owned groups Jason Gunthorpe
2023-03-30  6:45   ` Baolu Lu
2023-03-30 15:54   ` Robin Murphy
2023-03-30 16:49     ` Jason Gunthorpe
2023-04-04  9:21   ` Tian, Kevin
2023-03-29 23:40 ` [PATCH v2 09/14] iommu: Fix iommu_probe_device() to attach the right domain Jason Gunthorpe
2023-03-30  7:33   ` Baolu Lu
2023-04-04  9:25   ` Tian, Kevin
2023-03-29 23:40 ` [PATCH v2 10/14] iommu: Remove the assignment of group->domain during default domain alloc Jason Gunthorpe
2023-03-30  7:33   ` Baolu Lu
2023-03-29 23:40 ` [PATCH v2 11/14] iommu: Consolidate the code to calculate the target default domain type Jason Gunthorpe
2023-03-30 11:51   ` Baolu Lu
2023-04-04  9:39   ` Tian, Kevin
2023-04-04 18:51     ` Jason Gunthorpe
2023-03-29 23:40 ` [PATCH v2 12/14] iommu: Consolidate the default_domain setup to one function Jason Gunthorpe
2023-03-30 12:37   ` Baolu Lu
2023-03-30 14:29     ` Robin Murphy
2023-03-30 14:45       ` Jason Gunthorpe
2023-03-30 15:42         ` Jason Gunthorpe
2023-04-04 11:29           ` Robin Murphy
2023-03-30 15:36     ` Jason Gunthorpe
2023-03-30 18:23       ` Robin Murphy
2023-03-30 19:01         ` Jason Gunthorpe
2023-03-29 23:40 ` [PATCH v2 13/14] iommu: Remove __iommu_group_for_each_dev() Jason Gunthorpe
2023-03-30 12:40   ` Baolu Lu
2023-03-29 23:40 ` [PATCH v2 14/14] iommu: Tidy the control flow in iommu_group_store_type() Jason Gunthorpe
2023-03-30 12:45   ` Baolu Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3-v2-cd32667d2ba6+70bd1-iommu_err_unwind_jgg@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.