linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier()
@ 2022-02-28  0:50 Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 01/11] iommu: Add DMA ownership management interfaces Lu Baolu
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Hi folks,

The iommu group is the minimal isolation boundary for DMA. Devices in
a group can access each other's MMIO registers via peer to peer DMA
and also need share the same I/O address space.

Once the I/O address space is assigned to user control it is no longer
available to the dma_map* API, which effectively makes the DMA API
non-working.

Second, userspace can use DMA initiated by a device that it controls
to access the MMIO spaces of other devices in the group. This allows
userspace to indirectly attack any kernel owned device and it's driver.

Therefore groups must either be entirely under kernel control or
userspace control, never a mixture. Unfortunately some systems have
problems with the granularity of groups and there are a couple of
important exceptions:

 - pci_stub allows the admin to block driver binding on a device and
   make it permanently shared with userspace. Since PCI stub does not
   do DMA it is safe, however the admin must understand that using
   pci_stub allows userspace to attack whatever device it was bound
   it.

 - PCI bridges are sometimes included in groups. Typically PCI bridges
   do not use DMA, and generally do not have MMIO regions.

Generally any device that does not have any MMIO registers is a
possible candidate for an exception.

Currently vfio adopts a workaround to detect violations of the above
restrictions by monitoring the driver core BOUND event, and hardwiring
the above exceptions. Since there is no way for vfio to reject driver
binding at this point, BUG_ON() is triggered if a violation is
captured (kernel driver BOUND event on a group which already has some
devices assigned to userspace). Aside from the bad user experience
this opens a way for root userspace to crash the kernel, even in high
integrity configurations, by manipulating the module binding and
triggering the BUG_ON.

This series solves this problem by making the user/kernel ownership a
core concept at the IOMMU layer. The driver core enforces kernel
ownership while drivers are bound and violations now result in a error
codes during probe, not BUG_ON failures.

Patch partitions:
  [PATCH 1-4]: Detect DMA ownership conflicts during driver binding;
  [PATCH 5-7]: Add security context management for assigned devices;
  [PATCH 8-11]: Various cleanups.

This is also part one of three initial series for IOMMUFD:
 * Move IOMMU Group security into the iommu layer
 - Generic IOMMUFD implementation
 - VFIO ability to consume IOMMUFD

Change log:
v1: initial post
  - https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/

v2:
  - https://lore.kernel.org/linux-iommu/20211128025051.355578-1-baolu.lu@linux.intel.com/

  - Move kernel dma ownership auto-claiming from driver core to bus
    callback. [Greg/Christoph/Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#m153706912b770682cb12e3c28f57e171aa1f9d0c

  - Code and interface refactoring for iommu_set/release_dma_owner()
    interfaces. [Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Add new iommu_attach/detach_device_shared() interfaces for
    multiple devices group. [Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Use iommu_attach/detach_device_shared() in drm/tegra drivers.

  - Refactoring and description refinement.

v3:
  - https://lore.kernel.org/linux-iommu/20211206015903.88687-1-baolu.lu@linux.intel.com/

  - Rename bus_type::dma_unconfigure to bus_type::dma_cleanup. [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m6711e041e47cb0cbe3964fad0a3466f5ae4b3b9b

  - Avoid _platform_dma_configure for platform_bus_type::dma_configure.
    [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m43fc46286611aa56a5c0eeaad99d539e5519f3f6

  - Patch "0012-iommu-Add-iommu_at-de-tach_device_shared-for-mult.patch"
    and "0018-drm-tegra-Use-the-iommu-dma_owner-mechanism.patch" have
    been tested by Dmitry Osipenko <digetx@gmail.com>.

v4:
  - https://lore.kernel.org/linux-iommu/20211217063708.1740334-1-baolu.lu@linux.intel.com/
  - Remove unnecessary tegra->domain chech in the tegra patch. (Jason)
  - Remove DMA_OWNER_NONE. (Joerg)
  - Change refcount to unsigned int. (Christoph)
  - Move mutex lock into group set_dma_owner functions. (Christoph)
  - Add kernel doc for iommu_attach/detach_domain_shared(). (Christoph)
  - Move dma auto-claim into driver core. (Jason/Christoph)

v5:
  - https://lore.kernel.org/linux-iommu/20220104015644.2294354-1-baolu.lu@linux.intel.com/
  - Move kernel dma ownership auto-claiming from driver core to bus
    callback. (Greg)
  - Refactor the iommu interfaces to make them more specific.
    (Jason/Robin)
  - Simplify the dma ownership implementation by removing the owner
    type. (Jason)
  - Commit message refactoring for PCI drivers. (Bjorn)
  - Move iommu_attach/detach_device() improvement patches into another
    series as there are a lot of code refactoring and cleanup staffs
    in various device drivers.

v6:
  - https://lore.kernel.org/linux-iommu/20220218005521.172832-1-baolu.lu@linux.intel.com/
  - Refine comments and commit mesages.
  - Rename iommu_group_set_dma_owner() to iommu_group_claim_dma_owner().
  - Rename iommu_device_use/unuse_kernel_dma() to
    iommu_device_use/unuse_default_domain().
  - Remove unnecessary EXPORT_SYMBOL_GPL.
  - Change flag name from no_kernel_api_dma to driver_managed_dma.
  - Merge 4 "Add driver dma ownership management" patches into single
    one.

v7:
  - We discussed about adding some fields in driver structure and
    intercepting it in the bus notifier for driver unbinding. We agreed
    that the driver structure should not be used out of the driver core.
  - As iommu_group_claim/release_dma_owner() are only used by the VFIO,
    there're no use cases for multiple calls for a single group.
  - Add some commit messages in "vfio: Set DMA ownership for
    VFIO" to describe the intentional enhancement of unsafe bridge
    drivers.
  - Comments refinement.

This is based on next branch of linux-iommu tree:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
and also available on github:
https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v7

Best regards,
baolu

Jason Gunthorpe (1):
  vfio: Delete the unbound_list

Lu Baolu (10):
  iommu: Add DMA ownership management interfaces
  driver core: Add dma_cleanup callback in bus_type
  amba: Stop sharing platform_dma_configure()
  bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
  PCI: pci_stub: Set driver_managed_dma
  PCI: portdrv: Set driver_managed_dma
  vfio: Set DMA ownership for VFIO devices
  vfio: Remove use of vfio_group_viable()
  vfio: Remove iommu group notifier
  iommu: Remove iommu group changes notifier

 include/linux/amba/bus.h              |   8 +
 include/linux/device/bus.h            |   3 +
 include/linux/fsl/mc.h                |   8 +
 include/linux/iommu.h                 |  54 +++---
 include/linux/pci.h                   |   8 +
 include/linux/platform_device.h       |  10 +-
 drivers/amba/bus.c                    |  39 +++-
 drivers/base/dd.c                     |   5 +
 drivers/base/platform.c               |  23 ++-
 drivers/bus/fsl-mc/fsl-mc-bus.c       |  26 ++-
 drivers/iommu/iommu.c                 | 228 ++++++++++++++++--------
 drivers/pci/pci-driver.c              |  21 +++
 drivers/pci/pci-stub.c                |   1 +
 drivers/pci/pcie/portdrv_pci.c        |   2 +
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |   1 +
 drivers/vfio/pci/vfio_pci.c           |   1 +
 drivers/vfio/platform/vfio_amba.c     |   1 +
 drivers/vfio/platform/vfio_platform.c |   1 +
 drivers/vfio/vfio.c                   | 245 ++------------------------
 19 files changed, 347 insertions(+), 338 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-03-04 10:34   ` Eric Auger
  2022-02-28  0:50 ` [PATCH v7 02/11] driver core: Add dma_cleanup callback in bus_type Lu Baolu
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Multiple devices may be placed in the same IOMMU group because they
cannot be isolated from each other. These devices must either be
entirely under kernel control or userspace control, never a mixture.

This adds dma ownership management in iommu core and exposes several
interfaces for the device drivers and the device userspace assignment
framework (i.e. VFIO), so that any conflict between user and kernel
controlled dma could be detected at the beginning.

The device driver oriented interfaces are,

	int iommu_device_use_default_domain(struct device *dev);
	void iommu_device_unuse_default_domain(struct device *dev);

By calling iommu_device_use_default_domain(), the device driver tells
the iommu layer that the device dma is handled through the kernel DMA
APIs. The iommu layer will manage the IOVA and use the default domain
for DMA address translation.

The device user-space assignment framework oriented interfaces are,

	int iommu_group_claim_dma_owner(struct iommu_group *group,
					void *owner);
	void iommu_group_release_dma_owner(struct iommu_group *group);
	bool iommu_group_dma_owner_claimed(struct iommu_group *group);

The device userspace assignment must be disallowed if the DMA owner
claiming interface returns failure.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h |  31 +++++++++
 drivers/iommu/iommu.c | 153 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..77972ef978b5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -675,6 +675,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+int iommu_device_use_default_domain(struct device *dev);
+void iommu_device_unuse_default_domain(struct device *dev);
+
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
+void iommu_group_release_dma_owner(struct iommu_group *group);
+bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1031,6 +1038,30 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline int iommu_device_use_default_domain(struct device *dev)
+{
+	return 0;
+}
+
+static inline void iommu_device_unuse_default_domain(struct device *dev)
+{
+}
+
+static inline int
+iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+}
+
+static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
+{
+	return false;
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..eba8e8ccf19d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,8 @@ struct iommu_group {
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	unsigned int owner_cnt;
+	void *owner;
 };
 
 struct group_device {
@@ -294,7 +296,11 @@ int iommu_probe_device(struct device *dev)
 	mutex_lock(&group->mutex);
 	iommu_alloc_default_domain(group, dev);
 
-	if (group->default_domain) {
+	/*
+	 * If device joined an existing group which has been claimed, don't
+	 * attach the default domain.
+	 */
+	if (group->default_domain && !group->owner) {
 		ret = __iommu_attach_device(group->default_domain, dev);
 		if (ret) {
 			mutex_unlock(&group->mutex);
@@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (group->default_domain && group->domain != group->default_domain)
+	if (group->domain && group->domain != group->default_domain)
 		return -EBUSY;
 
 	ret = __iommu_group_for_each_dev(group, domain,
@@ -2146,7 +2152,11 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (!group->default_domain) {
+	/*
+	 * If the group has been claimed already, do not re-attach the default
+	 * domain.
+	 */
+	if (!group->default_domain || group->owner) {
 		__iommu_group_for_each_dev(group, domain,
 					   iommu_group_do_detach_device);
 		group->domain = NULL;
@@ -3095,3 +3105,140 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	return ret;
 }
+
+/**
+ * iommu_device_use_default_domain() - Device driver wants to handle device
+ *                                     DMA through the kernel DMA API.
+ * @dev: The device.
+ *
+ * The device driver about to bind @dev wants to do DMA through the kernel
+ * DMA API. Return 0 if it is allowed, otherwise an error.
+ */
+int iommu_device_use_default_domain(struct device *dev)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+	int ret = 0;
+
+	if (!group)
+		return 0;
+
+	mutex_lock(&group->mutex);
+	if (group->owner_cnt) {
+		if (group->domain != group->default_domain ||
+		    group->owner) {
+			ret = -EBUSY;
+			goto unlock_out;
+		}
+	}
+
+	group->owner_cnt++;
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+
+/**
+ * iommu_device_unuse_default_domain() - Device driver stops handling device
+ *                                       DMA through the kernel DMA API.
+ * @dev: The device.
+ *
+ * The device driver doesn't want to do DMA through kernel DMA API anymore.
+ * It must be called after iommu_device_use_default_domain().
+ */
+void iommu_device_unuse_default_domain(struct device *dev)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+
+	if (!group)
+		return;
+
+	mutex_lock(&group->mutex);
+	if (!WARN_ON(!group->owner_cnt))
+		group->owner_cnt--;
+
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+}
+
+/**
+ * iommu_group_claim_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: Caller specified pointer. Used for exclusive ownership.
+ *
+ * This is to support backward compatibility for vfio which manages
+ * the dma ownership in iommu_group level. New invocations on this
+ * interface should be prohibited.
+ */
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+	int ret = 0;
+
+	mutex_lock(&group->mutex);
+	if (group->owner_cnt) {
+		ret = -EPERM;
+		goto unlock_out;
+	} else {
+		if (group->domain && group->domain != group->default_domain) {
+			ret = -EBUSY;
+			goto unlock_out;
+		}
+
+		group->owner = owner;
+		if (group->domain)
+			__iommu_detach_group(group->domain, group);
+	}
+
+	group->owner_cnt++;
+unlock_out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
+
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ *
+ * Release the DMA ownership claimed by iommu_group_claim_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+	mutex_lock(&group->mutex);
+	if (WARN_ON(!group->owner_cnt || !group->owner))
+		goto unlock_out;
+
+	group->owner_cnt = 0;
+	/*
+	 * The UNMANAGED domain should be detached before all USER
+	 * owners have been released.
+	 */
+	if (!WARN_ON(group->domain) && group->default_domain)
+		__iommu_attach_group(group->default_domain, group);
+	group->owner = NULL;
+unlock_out:
+	mutex_unlock(&group->mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_claimed() - Query group dma ownership status
+ * @group: The group.
+ *
+ * This provides status query on a given group. It is racy and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_claimed(struct iommu_group *group)
+{
+	unsigned int user;
+
+	mutex_lock(&group->mutex);
+	user = group->owner_cnt;
+	mutex_unlock(&group->mutex);
+
+	return user;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 02/11] driver core: Add dma_cleanup callback in bus_type
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 01/11] iommu: Add DMA ownership management interfaces Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 03/11] amba: Stop sharing platform_dma_configure() Lu Baolu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The bus_type structure defines dma_configure() callback for bus drivers
to configure DMA on the devices. This adds the paired dma_cleanup()
callback and calls it during driver unbinding so that bus drivers can do
some cleanup work.

One use case for this paired DMA callbacks is for the bus driver to check
for DMA ownership conflicts during driver binding, where multiple devices
belonging to a same IOMMU group (the minimum granularity of isolation and
protection) may be assigned to kernel drivers or user space respectively.

Without this change, for example, the vfio driver has to listen to a bus
BOUND_DRIVER event and then BUG_ON() in case of dma ownership conflict.
This leads to bad user experience since careless driver binding operation
may crash the system if the admin overlooks the group restriction. Aside
from bad design, this leads to a security problem as a root user, even with
lockdown=integrity, can force the kernel to BUG.

With this change, the bus driver could check and set the DMA ownership in
driver binding process and fail on ownership conflicts. The DMA ownership
should be released during driver unbinding.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/device/bus.h | 3 +++
 drivers/base/dd.c          | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index a039ab809753..d8b29ccd07e5 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -59,6 +59,8 @@ struct fwnode_handle;
  *		bus supports.
  * @dma_configure:	Called to setup DMA configuration on a device on
  *			this bus.
+ * @dma_cleanup:	Called to cleanup DMA configuration on a device on
+ *			this bus.
  * @pm:		Power management operations of this bus, callback the specific
  *		device driver's pm-ops.
  * @iommu_ops:  IOMMU specific operations for this bus, used to attach IOMMU
@@ -103,6 +105,7 @@ struct bus_type {
 	int (*num_vf)(struct device *dev);
 
 	int (*dma_configure)(struct device *dev);
+	void (*dma_cleanup)(struct device *dev);
 
 	const struct dev_pm_ops *pm;
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9eaaff2f556c..de05c5c60c6b 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -662,6 +662,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
+	if (dev->bus && dev->bus->dma_cleanup)
+		dev->bus->dma_cleanup(dev);
 pinctrl_bind_failed:
 	device_links_no_driver(dev);
 	devres_release_all(dev);
@@ -1205,6 +1207,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 		else if (drv->remove)
 			drv->remove(dev);
 
+		if (dev->bus && dev->bus->dma_cleanup)
+			dev->bus->dma_cleanup(dev);
+
 		device_links_driver_cleanup(dev);
 
 		devres_release_all(dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 03/11] amba: Stop sharing platform_dma_configure()
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 01/11] iommu: Add DMA ownership management interfaces Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 02/11] driver core: Add dma_cleanup callback in bus_type Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management Lu Baolu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Stop sharing platform_dma_configure() helper as they are about to have
their own bus dma_configure callbacks.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/platform_device.h |  2 --
 drivers/amba/bus.c              | 19 ++++++++++++++++++-
 drivers/base/platform.c         |  3 +--
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 7c96f169d274..17fde717df68 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -328,8 +328,6 @@ extern int platform_pm_restore(struct device *dev);
 #define platform_pm_restore		NULL
 #endif
 
-extern int platform_dma_configure(struct device *dev);
-
 #ifdef CONFIG_PM_SLEEP
 #define USE_PLATFORM_PM_SLEEP_OPS \
 	.suspend = platform_pm_suspend, \
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index e1a5eca3ae3c..8392f4aa251b 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -20,6 +20,8 @@
 #include <linux/platform_device.h>
 #include <linux/reset.h>
 #include <linux/of_irq.h>
+#include <linux/of_device.h>
+#include <linux/acpi.h>
 
 #define to_amba_driver(d)	container_of(d, struct amba_driver, drv)
 
@@ -273,6 +275,21 @@ static void amba_shutdown(struct device *dev)
 		drv->shutdown(to_amba_device(dev));
 }
 
+static int amba_dma_configure(struct device *dev)
+{
+	enum dev_dma_attr attr;
+	int ret = 0;
+
+	if (dev->of_node) {
+		ret = of_dma_configure(dev, dev->of_node, true);
+	} else if (has_acpi_companion(dev)) {
+		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
+		ret = acpi_dma_configure(dev, attr);
+	}
+
+	return ret;
+}
+
 #ifdef CONFIG_PM
 /*
  * Hooks to provide runtime PM of the pclk (bus clock).  It is safe to
@@ -341,7 +358,7 @@ struct bus_type amba_bustype = {
 	.probe		= amba_probe,
 	.remove		= amba_remove,
 	.shutdown	= amba_shutdown,
-	.dma_configure	= platform_dma_configure,
+	.dma_configure	= amba_dma_configure,
 	.pm		= &amba_pm,
 };
 EXPORT_SYMBOL_GPL(amba_bustype);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6cb04ac48bf0..acbc6eae37b8 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1454,8 +1454,7 @@ static void platform_shutdown(struct device *_dev)
 		drv->shutdown(dev);
 }
 
-
-int platform_dma_configure(struct device *dev)
+static int platform_dma_configure(struct device *dev)
 {
 	enum dev_dma_attr attr;
 	int ret = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (2 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 03/11] amba: Stop sharing platform_dma_configure() Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma Lu Baolu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers
with the device DMA managed by kernel drivers or user-space applications.
Unfortunately, multiple devices may be placed in the same IOMMU group
because they cannot be isolated from each other. The DMA on these devices
must either be entirely under kernel control or userspace control, never
a mixture. Otherwise the driver integrity is not guaranteed because they
could access each other through the peer-to-peer accesses which by-pass
the IOMMU protection.

This checks and sets the default DMA mode during driver binding, and
cleanups during driver unbinding. In the default mode, the device DMA is
managed by the device driver which handles DMA operations through the
kernel DMA APIs (see Documentation/core-api/dma-api.rst).

For cases where the devices are assigned for userspace control through the
userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/
vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this
default setting in the assumption that the drivers know what they are
doing with the device DMA.

With the IOMMU layer knowing DMA ownership of each device, above problem
can be solved.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/amba/bus.h        |  8 ++++++++
 include/linux/fsl/mc.h          |  8 ++++++++
 include/linux/pci.h             |  8 ++++++++
 include/linux/platform_device.h |  8 ++++++++
 drivers/amba/bus.c              | 20 ++++++++++++++++++++
 drivers/base/platform.c         | 20 ++++++++++++++++++++
 drivers/bus/fsl-mc/fsl-mc-bus.c | 26 ++++++++++++++++++++++++--
 drivers/pci/pci-driver.c        | 21 +++++++++++++++++++++
 8 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 6c7f47846971..e9cd981be94e 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -79,6 +79,14 @@ struct amba_driver {
 	void			(*remove)(struct amba_device *);
 	void			(*shutdown)(struct amba_device *);
 	const struct amba_id	*id_table;
+	/*
+	 * For most device drivers, no need to care about this flag as long as
+	 * all DMAs are handled through the kernel DMA API. For some special
+	 * ones, for example VFIO drivers, they know how to manage the DMA
+	 * themselves and set this flag so that the IOMMU layer will allow them
+	 * to setup and manage their own I/O address space.
+	 */
+	bool driver_managed_dma;
 };
 
 /*
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index 7b6c42bfb660..27efef8affb1 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -32,6 +32,13 @@ struct fsl_mc_io;
  * @shutdown: Function called at shutdown time to quiesce the device
  * @suspend: Function called when a device is stopped
  * @resume: Function called when a device is resumed
+ * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
+ *		For most device drivers, no need to care about this flag
+ *		as long as all DMAs are handled through the kernel DMA API.
+ *		For some special ones, for example VFIO drivers, they know
+ *		how to manage the DMA themselves and set this flag so that
+ *		the IOMMU layer will allow them to setup and manage their
+ *		own I/O address space.
  *
  * Generic DPAA device driver object for device drivers that are registered
  * with a DPRC bus. This structure is to be embedded in each device-specific
@@ -45,6 +52,7 @@ struct fsl_mc_driver {
 	void (*shutdown)(struct fsl_mc_device *dev);
 	int (*suspend)(struct fsl_mc_device *dev, pm_message_t state);
 	int (*resume)(struct fsl_mc_device *dev);
+	bool driver_managed_dma;
 };
 
 #define to_fsl_mc_driver(_drv) \
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8253a5413d7c..b94bce839b83 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -894,6 +894,13 @@ struct module;
  *              created once it is bound to the driver.
  * @driver:	Driver model structure.
  * @dynids:	List of dynamically added device IDs.
+ * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
+ *		For most device drivers, no need to care about this flag
+ *		as long as all DMAs are handled through the kernel DMA API.
+ *		For some special ones, for example VFIO drivers, they know
+ *		how to manage the DMA themselves and set this flag so that
+ *		the IOMMU layer will allow them to setup and manage their
+ *		own I/O address space.
  */
 struct pci_driver {
 	struct list_head	node;
@@ -912,6 +919,7 @@ struct pci_driver {
 	const struct attribute_group **dev_groups;
 	struct device_driver	driver;
 	struct pci_dynids	dynids;
+	bool driver_managed_dma;
 };
 
 static inline struct pci_driver *to_pci_driver(struct device_driver *drv)
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 17fde717df68..b3d9c744f1e5 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -210,6 +210,14 @@ struct platform_driver {
 	struct device_driver driver;
 	const struct platform_device_id *id_table;
 	bool prevent_deferred_probe;
+	/*
+	 * For most device drivers, no need to care about this flag as long as
+	 * all DMAs are handled through the kernel DMA API. For some special
+	 * ones, for example VFIO drivers, they know how to manage the DMA
+	 * themselves and set this flag so that the IOMMU layer will allow them
+	 * to setup and manage their own I/O address space.
+	 */
+	bool driver_managed_dma;
 };
 
 #define to_platform_driver(drv)	(container_of((drv), struct platform_driver, \
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 8392f4aa251b..cebf03522524 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -22,6 +22,7 @@
 #include <linux/of_irq.h>
 #include <linux/of_device.h>
 #include <linux/acpi.h>
+#include <linux/iommu.h>
 
 #define to_amba_driver(d)	container_of(d, struct amba_driver, drv)
 
@@ -277,9 +278,16 @@ static void amba_shutdown(struct device *dev)
 
 static int amba_dma_configure(struct device *dev)
 {
+	struct amba_driver *drv = to_amba_driver(dev->driver);
 	enum dev_dma_attr attr;
 	int ret = 0;
 
+	if (!drv->driver_managed_dma) {
+		ret = iommu_device_use_default_domain(dev);
+		if (ret)
+			return ret;
+	}
+
 	if (dev->of_node) {
 		ret = of_dma_configure(dev, dev->of_node, true);
 	} else if (has_acpi_companion(dev)) {
@@ -287,9 +295,20 @@ static int amba_dma_configure(struct device *dev)
 		ret = acpi_dma_configure(dev, attr);
 	}
 
+	if (ret && !drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+
 	return ret;
 }
 
+static void amba_dma_cleanup(struct device *dev)
+{
+	struct amba_driver *drv = to_amba_driver(dev->driver);
+
+	if (!drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+}
+
 #ifdef CONFIG_PM
 /*
  * Hooks to provide runtime PM of the pclk (bus clock).  It is safe to
@@ -359,6 +378,7 @@ struct bus_type amba_bustype = {
 	.remove		= amba_remove,
 	.shutdown	= amba_shutdown,
 	.dma_configure	= amba_dma_configure,
+	.dma_cleanup	= amba_dma_cleanup,
 	.pm		= &amba_pm,
 };
 EXPORT_SYMBOL_GPL(amba_bustype);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index acbc6eae37b8..ad8ea9453cdb 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -30,6 +30,7 @@
 #include <linux/property.h>
 #include <linux/kmemleak.h>
 #include <linux/types.h>
+#include <linux/iommu.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -1456,9 +1457,16 @@ static void platform_shutdown(struct device *_dev)
 
 static int platform_dma_configure(struct device *dev)
 {
+	struct platform_driver *drv = to_platform_driver(dev->driver);
 	enum dev_dma_attr attr;
 	int ret = 0;
 
+	if (!drv->driver_managed_dma) {
+		ret = iommu_device_use_default_domain(dev);
+		if (ret)
+			return ret;
+	}
+
 	if (dev->of_node) {
 		ret = of_dma_configure(dev, dev->of_node, true);
 	} else if (has_acpi_companion(dev)) {
@@ -1466,9 +1474,20 @@ static int platform_dma_configure(struct device *dev)
 		ret = acpi_dma_configure(dev, attr);
 	}
 
+	if (ret && !drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+
 	return ret;
 }
 
+static void platform_dma_cleanup(struct device *dev)
+{
+	struct platform_driver *drv = to_platform_driver(dev->driver);
+
+	if (!drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+}
+
 static const struct dev_pm_ops platform_dev_pm_ops = {
 	SET_RUNTIME_PM_OPS(pm_generic_runtime_suspend, pm_generic_runtime_resume, NULL)
 	USE_PLATFORM_PM_SLEEP_OPS
@@ -1483,6 +1502,7 @@ struct bus_type platform_bus_type = {
 	.remove		= platform_remove,
 	.shutdown	= platform_shutdown,
 	.dma_configure	= platform_dma_configure,
+	.dma_cleanup	= platform_dma_cleanup,
 	.pm		= &platform_dev_pm_ops,
 };
 EXPORT_SYMBOL_GPL(platform_bus_type);
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 8fd4a356a86e..eca3406a14ce 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -140,15 +140,36 @@ static int fsl_mc_dma_configure(struct device *dev)
 {
 	struct device *dma_dev = dev;
 	struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
+	struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(dev->driver);
 	u32 input_id = mc_dev->icid;
+	int ret;
+
+	if (!mc_drv->driver_managed_dma) {
+		ret = iommu_device_use_default_domain(dev);
+		if (ret)
+			return ret;
+	}
 
 	while (dev_is_fsl_mc(dma_dev))
 		dma_dev = dma_dev->parent;
 
 	if (dev_of_node(dma_dev))
-		return of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id);
+		ret = of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id);
+	else
+		ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
+
+	if (ret && !mc_drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+
+	return ret;
+}
+
+static void fsl_mc_dma_cleanup(struct device *dev)
+{
+	struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(dev->driver);
 
-	return acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
+	if (!mc_drv->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
 }
 
 static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
@@ -312,6 +333,7 @@ struct bus_type fsl_mc_bus_type = {
 	.match = fsl_mc_bus_match,
 	.uevent = fsl_mc_bus_uevent,
 	.dma_configure  = fsl_mc_dma_configure,
+	.dma_cleanup = fsl_mc_dma_cleanup,
 	.dev_groups = fsl_mc_dev_groups,
 	.bus_groups = fsl_mc_bus_groups,
 };
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 588588cfda48..893a8707c179 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -20,6 +20,7 @@
 #include <linux/of_device.h>
 #include <linux/acpi.h>
 #include <linux/dma-map-ops.h>
+#include <linux/iommu.h>
 #include "pci.h"
 #include "pcie/portdrv.h"
 
@@ -1590,9 +1591,16 @@ static int pci_bus_num_vf(struct device *dev)
  */
 static int pci_dma_configure(struct device *dev)
 {
+	struct pci_driver *driver = to_pci_driver(dev->driver);
 	struct device *bridge;
 	int ret = 0;
 
+	if (!driver->driver_managed_dma) {
+		ret = iommu_device_use_default_domain(dev);
+		if (ret)
+			return ret;
+	}
+
 	bridge = pci_get_host_bridge_device(to_pci_dev(dev));
 
 	if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
@@ -1605,9 +1613,21 @@ static int pci_dma_configure(struct device *dev)
 	}
 
 	pci_put_host_bridge_device(bridge);
+
+	if (ret && !driver->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+
 	return ret;
 }
 
+static void pci_dma_cleanup(struct device *dev)
+{
+	struct pci_driver *driver = to_pci_driver(dev->driver);
+
+	if (!driver->driver_managed_dma)
+		iommu_device_unuse_default_domain(dev);
+}
+
 struct bus_type pci_bus_type = {
 	.name		= "pci",
 	.match		= pci_bus_match,
@@ -1621,6 +1641,7 @@ struct bus_type pci_bus_type = {
 	.pm		= PCI_PM_OPS_PTR,
 	.num_vf		= pci_bus_num_vf,
 	.dma_configure	= pci_dma_configure,
+	.dma_cleanup	= pci_dma_cleanup,
 };
 EXPORT_SYMBOL(pci_bus_type);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (3 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 19:54   ` Bjorn Helgaas
  2022-02-28  0:50 ` [PATCH v7 06/11] PCI: portdrv: " Lu Baolu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The current VFIO implementation allows pci-stub driver to be bound to
a PCI device with other devices in the same IOMMU group being assigned
to userspace. The pci-stub driver has no dependencies on DMA or the
IOVA mapping of the device, but it does prevent the user from having
direct access to the device, which is useful in some circumstances.

The pci_dma_configure() marks the iommu_group as containing only devices
with kernel drivers that manage DMA. For compatibility with the VFIO
usage, avoid this default behavior for the pci_stub. This allows the
pci_stub still able to be used by the admin to block driver binding after
applying the DMA ownership to VFIO.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/pci/pci-stub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..d1f4c1ce7bd1 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,7 @@ static struct pci_driver stub_driver = {
 	.name		= "pci-stub",
 	.id_table	= NULL,	/* only dynamic id's */
 	.probe		= pci_stub_probe,
+	.driver_managed_dma = true,
 };
 
 static int __init pci_stub_init(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 06/11] PCI: portdrv: Set driver_managed_dma
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (4 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 19:56   ` Bjorn Helgaas
  2022-02-28  0:50 ` [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices Lu Baolu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

If a switch lacks ACS P2P Request Redirect, a device below the switch can
bypass the IOMMU and DMA directly to other devices below the switch, so
all the downstream devices must be in the same IOMMU group as the switch
itself.

The existing VFIO framework allows the portdrv driver to be bound to the
bridge while its downstream devices are assigned to user space. The
pci_dma_configure() marks the IOMMU group as containing only devices
with kernel drivers that manage DMA. Avoid this default behavior for the
portdrv driver in order for compatibility with the current VFIO usage.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/pci/pcie/portdrv_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..6b2adb678c21 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
+	.driver_managed_dma = true,
+
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (5 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 06/11] PCI: portdrv: " Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 22:06   ` Alex Williamson
  2022-02-28  0:50 ` [PATCH v7 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Claim group dma ownership when an IOMMU group is set to a container,
and release the dma ownership once the iommu group is unset from the
container.

This change disallows some unsafe bridge drivers to bind to non-ACS
bridges while devices under them are assigned to user space. This is an
intentional enhancement and possibly breaks some existing
configurations. The recommendation to such an affected user would be
that the previously allowed host bridge driver was unsafe for this use
case and to continue to enable assignment of devices within that group,
the driver should be unbound from the bridge device or replaced with the
pci-stub driver.

For any bridge driver, we consider it unsafe if it satisfies any of the
following conditions:

  1) The bridge driver uses DMA. Calling pci_set_master() or calling any
     kernel DMA API (dma_map_*() and etc.) is an indicate that the
     driver is doing DMA.

  2) If the bridge driver uses MMIO, it should be tolerant to hostile
     userspace also touching the same MMIO registers via P2P DMA
     attacks.

If the bridge driver turns out to be a safe one, it could be used as
before by setting the driver's .driver_managed_dma field, just like what
we have done in the pcieport driver.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |  1 +
 drivers/vfio/pci/vfio_pci.c           |  1 +
 drivers/vfio/platform/vfio_amba.c     |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/vfio.c                   | 10 +++++++++-
 5 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index 6e2e62c6f47a..3feff729f3ce 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -588,6 +588,7 @@ static struct fsl_mc_driver vfio_fsl_mc_driver = {
 		.name	= "vfio-fsl-mc",
 		.owner	= THIS_MODULE,
 	},
+	.driver_managed_dma = true,
 };
 
 static int __init vfio_fsl_mc_driver_init(void)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a5ce92beb655..941909d3918b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -193,6 +193,7 @@ static struct pci_driver vfio_pci_driver = {
 	.remove			= vfio_pci_remove,
 	.sriov_configure	= vfio_pci_sriov_configure,
 	.err_handler		= &vfio_pci_core_err_handlers,
+	.driver_managed_dma	= true,
 };
 
 static void __init vfio_pci_fill_ids(void)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index badfffea14fb..1aaa4f721bd2 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -95,6 +95,7 @@ static struct amba_driver vfio_amba_driver = {
 		.name = "vfio-amba",
 		.owner = THIS_MODULE,
 	},
+	.driver_managed_dma = true,
 };
 
 module_amba_driver(vfio_amba_driver);
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 68a1c87066d7..04f40c5acfd6 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -76,6 +76,7 @@ static struct platform_driver vfio_platform_driver = {
 	.driver	= {
 		.name	= "vfio-platform",
 	},
+	.driver_managed_dma = true,
 };
 
 module_platform_driver(vfio_platform_driver);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 735d1d344af9..df9d4b60e5ae 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1198,6 +1198,8 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 		driver->ops->detach_group(container->iommu_data,
 					  group->iommu_group);
 
+	iommu_group_release_dma_owner(group->iommu_group);
+
 	group->container = NULL;
 	wake_up(&group->container_q);
 	list_del(&group->container_next);
@@ -1282,13 +1284,19 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 		goto unlock_out;
 	}
 
+	ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
+	if (ret)
+		goto unlock_out;
+
 	driver = container->iommu_driver;
 	if (driver) {
 		ret = driver->ops->attach_group(container->iommu_data,
 						group->iommu_group,
 						group->type);
-		if (ret)
+		if (ret) {
+			iommu_group_release_dma_owner(group->iommu_group);
 			goto unlock_out;
+		}
 	}
 
 	group->container = container;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 08/11] vfio: Remove use of vfio_group_viable()
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (6 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 22:06   ` Alex Williamson
  2022-02-28  0:50 ` [PATCH v7 09/11] vfio: Delete the unbound_list Lu Baolu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

As DMA ownership is claimed for the iommu group when a VFIO group is
added to a VFIO container, the VFIO group viability is guaranteed as long
as group->container_users > 0. Remove those unnecessary group viability
checks which are only hit when group->container_users is not zero.

The only remaining reference is in GROUP_GET_STATUS, which could be called
at any time when group fd is valid. Here we just replace the
vfio_group_viable() by directly calling IOMMU core to get viability status.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index df9d4b60e5ae..73034446e03f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1313,12 +1313,6 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	return ret;
 }
 
-static bool vfio_group_viable(struct vfio_group *group)
-{
-	return (iommu_group_for_each_dev(group->iommu_group,
-					 group, vfio_dev_viable) == 0);
-}
-
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
 	if (!atomic_inc_not_zero(&group->container_users))
@@ -1328,7 +1322,7 @@ static int vfio_group_add_container_user(struct vfio_group *group)
 		atomic_dec(&group->container_users);
 		return -EPERM;
 	}
-	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+	if (!group->container->iommu_driver) {
 		atomic_dec(&group->container_users);
 		return -EINVAL;
 	}
@@ -1346,7 +1340,7 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	int ret = 0;
 
 	if (0 == atomic_read(&group->container_users) ||
-	    !group->container->iommu_driver || !vfio_group_viable(group))
+	    !group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
@@ -1438,11 +1432,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 		status.flags = 0;
 
-		if (vfio_group_viable(group))
-			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
-
 		if (group->container)
-			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET;
+			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
+					VFIO_GROUP_FLAGS_VIABLE;
+		else if (!iommu_group_dma_owner_claimed(group->iommu_group))
+			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
 
 		if (copy_to_user((void __user *)arg, &status, minsz))
 			return -EFAULT;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 09/11] vfio: Delete the unbound_list
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (7 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 22:06   ` Alex Williamson
  2022-02-28  0:50 ` [PATCH v7 10/11] vfio: Remove iommu group notifier Lu Baolu
  2022-02-28  0:50 ` [PATCH v7 11/11] iommu: Remove iommu group changes notifier Lu Baolu
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

From: Jason Gunthorpe <jgg@nvidia.com>

commit 60720a0fc646 ("vfio: Add device tracking during unbind") added the
unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL
relied on vfio_group_get_external_user() succeeding to return the
vfio_group from a group file descriptor. The unbound list allowed
vfio_group_get_external_user() to continue to succeed in edge cases.

However commit 5d6dee80a1e9 ("vfio: New external user group/file match")
deleted the call to vfio_group_get_external_user() during
KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used
to directly match the file descriptor to the group pointer.

This in turn avoids the call down to vfio_dev_viable() during
KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was
trying to fix.

There are no other users of vfio_dev_viable() that care about the time
after vfio_unregister_group_dev() returns, so simply delete the
unbound_list entirely.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 74 ++-------------------------------------------
 1 file changed, 2 insertions(+), 72 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 73034446e03f..e0df2bc692b2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,11 +62,6 @@ struct vfio_container {
 	bool				noiommu;
 };
 
-struct vfio_unbound_dev {
-	struct device			*dev;
-	struct list_head		unbound_next;
-};
-
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -79,8 +74,6 @@ struct vfio_group {
 	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	struct list_head		unbound_list;
-	struct mutex			unbound_lock;
 	atomic_t			opened;
 	wait_queue_head_t		container_q;
 	enum vfio_group_type		type;
@@ -340,16 +333,8 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 static void vfio_group_release(struct device *dev)
 {
 	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
-	struct vfio_unbound_dev *unbound, *tmp;
-
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
 
 	mutex_destroy(&group->device_lock);
-	mutex_destroy(&group->unbound_lock);
 	iommu_group_put(group->iommu_group);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
@@ -381,8 +366,6 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
-	INIT_LIST_HEAD(&group->unbound_list);
-	mutex_init(&group->unbound_lock);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
 	/* put in vfio_group_release() */
@@ -571,19 +554,8 @@ static int vfio_dev_viable(struct device *dev, void *data)
 	struct vfio_group *group = data;
 	struct vfio_device *device;
 	struct device_driver *drv = READ_ONCE(dev->driver);
-	struct vfio_unbound_dev *unbound;
-	int ret = -EINVAL;
 
-	mutex_lock(&group->unbound_lock);
-	list_for_each_entry(unbound, &group->unbound_list, unbound_next) {
-		if (dev == unbound->dev) {
-			ret = 0;
-			break;
-		}
-	}
-	mutex_unlock(&group->unbound_lock);
-
-	if (!ret || !drv || vfio_dev_driver_allowed(dev, drv))
+	if (!drv || vfio_dev_driver_allowed(dev, drv))
 		return 0;
 
 	device = vfio_group_get_device(group, dev);
@@ -592,7 +564,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 		return 0;
 	}
 
-	return ret;
+	return -EINVAL;
 }
 
 /*
@@ -634,7 +606,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 {
 	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
 	struct device *dev = data;
-	struct vfio_unbound_dev *unbound;
 
 	switch (action) {
 	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
@@ -663,28 +634,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 			__func__, iommu_group_id(group->iommu_group),
 			dev->driver->name);
 		break;
-	case IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbound from driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		/*
-		 * XXX An unbound device in a live group is ok, but we'd
-		 * really like to avoid the above BUG_ON by preventing other
-		 * drivers from binding to it.  Once that occurs, we have to
-		 * stop the system to maintain isolation.  At a minimum, we'd
-		 * want a toggle to disable driver auto probe for this device.
-		 */
-
-		mutex_lock(&group->unbound_lock);
-		list_for_each_entry(unbound,
-				    &group->unbound_list, unbound_next) {
-			if (dev == unbound->dev) {
-				list_del(&unbound->unbound_next);
-				kfree(unbound);
-				break;
-			}
-		}
-		mutex_unlock(&group->unbound_lock);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -889,29 +838,10 @@ static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 void vfio_unregister_group_dev(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
-	struct vfio_unbound_dev *unbound;
 	unsigned int i = 0;
 	bool interrupted = false;
 	long rc;
 
-	/*
-	 * When the device is removed from the group, the group suddenly
-	 * becomes non-viable; the device has a driver (until the unbind
-	 * completes), but it's not present in the group.  This is bad news
-	 * for any external users that need to re-acquire a group reference
-	 * in order to match and release their existing reference.  To
-	 * solve this, we track such devices on the unbound_list to bridge
-	 * the gap until they're fully unbound.
-	 */
-	unbound = kzalloc(sizeof(*unbound), GFP_KERNEL);
-	if (unbound) {
-		unbound->dev = device->dev;
-		mutex_lock(&group->unbound_lock);
-		list_add(&unbound->unbound_next, &group->unbound_list);
-		mutex_unlock(&group->unbound_lock);
-	}
-	WARN_ON(!unbound);
-
 	vfio_device_put(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 10/11] vfio: Remove iommu group notifier
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (8 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 09/11] vfio: Delete the unbound_list Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  2022-02-28 22:06   ` Alex Williamson
  2022-02-28  0:50 ` [PATCH v7 11/11] iommu: Remove iommu group changes notifier Lu Baolu
  10 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 147 --------------------------------------------
 1 file changed, 147 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e0df2bc692b2..dd3fac0d6bc9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -71,7 +71,6 @@ struct vfio_group {
 	struct vfio_container		*container;
 	struct list_head		device_list;
 	struct mutex			device_lock;
-	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
 	atomic_t			opened;
@@ -274,8 +273,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data);
 static void vfio_group_get(struct vfio_group *group);
 
 /*
@@ -395,13 +392,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 		goto err_put;
 	}
 
-	group->nb.notifier_call = vfio_iommu_group_notifier;
-	err = iommu_group_register_notifier(iommu_group, &group->nb);
-	if (err) {
-		ret = ERR_PTR(err);
-		goto err_put;
-	}
-
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
@@ -422,7 +412,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 err_unlock:
 	mutex_unlock(&vfio.group_lock);
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 err_put:
 	put_device(&group->dev);
 	return ret;
@@ -447,7 +436,6 @@ static void vfio_group_put(struct vfio_group *group)
 	cdev_device_del(&group->cdev, &group->dev);
 	mutex_unlock(&vfio.group_lock);
 
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 	put_device(&group->dev);
 }
 
@@ -503,141 +491,6 @@ static struct vfio_device *vfio_group_get_device(struct vfio_group *group,
 	return NULL;
 }
 
-/*
- * Some drivers, like pci-stub, are only used to prevent other drivers from
- * claiming a device and are therefore perfectly legitimate for a user owned
- * group.  The pci-stub driver has no dependencies on DMA or the IOVA mapping
- * of the device, but it does prevent the user from having direct access to
- * the device, which is useful in some circumstances.
- *
- * We also assume that we can include PCI interconnect devices, ie. bridges.
- * IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
- * then all of the downstream devices will be part of the same IOMMU group as
- * the bridge.  Thus, if placing the bridge into the user owned IOVA space
- * breaks anything, it only does so for user owned devices downstream.  Note
- * that error notification via MSI can be affected for platforms that handle
- * MSI within the same IOVA space as DMA.
- */
-static const char * const vfio_driver_allowed[] = { "pci-stub" };
-
-static bool vfio_dev_driver_allowed(struct device *dev,
-				    struct device_driver *drv)
-{
-	if (dev_is_pci(dev)) {
-		struct pci_dev *pdev = to_pci_dev(dev);
-
-		if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-			return true;
-	}
-
-	return match_string(vfio_driver_allowed,
-			    ARRAY_SIZE(vfio_driver_allowed),
-			    drv->name) >= 0;
-}
-
-/*
- * A vfio group is viable for use by userspace if all devices are in
- * one of the following states:
- *  - driver-less
- *  - bound to a vfio driver
- *  - bound to an otherwise allowed driver
- *  - a PCI interconnect device
- *
- * We use two methods to determine whether a device is bound to a vfio
- * driver.  The first is to test whether the device exists in the vfio
- * group.  The second is to test if the device exists on the group
- * unbound_list, indicating it's in the middle of transitioning from
- * a vfio driver to driver-less.
- */
-static int vfio_dev_viable(struct device *dev, void *data)
-{
-	struct vfio_group *group = data;
-	struct vfio_device *device;
-	struct device_driver *drv = READ_ONCE(dev->driver);
-
-	if (!drv || vfio_dev_driver_allowed(dev, drv))
-		return 0;
-
-	device = vfio_group_get_device(group, dev);
-	if (device) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/*
- * Async device support
- */
-static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
-{
-	struct vfio_device *device;
-
-	/* Do we already know about it?  We shouldn't */
-	device = vfio_group_get_device(group, dev);
-	if (WARN_ON_ONCE(device)) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	/* Nothing to do for idle groups */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	/* TODO Prevent device auto probing */
-	dev_WARN(dev, "Device added to live group %d!\n",
-		 iommu_group_id(group->iommu_group));
-
-	return 0;
-}
-
-static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
-{
-	/* We don't care what happens when the group isn't in use */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	return vfio_dev_viable(dev, group);
-}
-
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data)
-{
-	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
-	struct device *dev = data;
-
-	switch (action) {
-	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
-		vfio_group_nb_add_dev(group, dev);
-		break;
-	case IOMMU_GROUP_NOTIFY_DEL_DEVICE:
-		/*
-		 * Nothing to do here.  If the device is in use, then the
-		 * vfio sub-driver should block the remove callback until
-		 * it is unused.  If the device is unused or attached to a
-		 * stub driver, then it should be released and we don't
-		 * care that it will be going away.
-		 */
-		break;
-	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		dev_dbg(dev, "%s: group %d binding to driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		break;
-	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d bound to driver %s\n", __func__,
-			iommu_group_id(group->iommu_group), dev->driver->name);
-		BUG_ON(vfio_group_nb_verify(group, dev));
-		break;
-	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbinding from driver %s\n",
-			__func__, iommu_group_id(group->iommu_group),
-			dev->driver->name);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
 /*
  * VFIO driver API
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 11/11] iommu: Remove iommu group changes notifier
  2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (9 preceding siblings ...)
  2022-02-28  0:50 ` [PATCH v7 10/11] vfio: Remove iommu group notifier Lu Baolu
@ 2022-02-28  0:50 ` Lu Baolu
  10 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-02-28  0:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu, Christoph Hellwig

The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/iommu.h | 23 -------------
 drivers/iommu/iommu.c | 75 -------------------------------------------
 2 files changed, 98 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 77972ef978b5..6ef2df258673 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -407,13 +407,6 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
 	return dev->iommu->iommu_dev->ops;
 }
 
-#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
-#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
-#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
-#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
-#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
-#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
-
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
@@ -478,10 +471,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
 extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_group_register_notifier(struct iommu_group *group,
-					 struct notifier_block *nb);
-extern int iommu_group_unregister_notifier(struct iommu_group *group,
-					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data);
@@ -878,18 +867,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline int iommu_group_register_notifier(struct iommu_group *group,
-						struct notifier_block *nb)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_group_unregister_notifier(struct iommu_group *group,
-						  struct notifier_block *nb)
-{
-	return 0;
-}
-
 static inline
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eba8e8ccf19d..0c42ece25854 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -18,7 +18,6 @@
 #include <linux/errno.h>
 #include <linux/iommu.h>
 #include <linux/idr.h>
-#include <linux/notifier.h>
 #include <linux/err.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
@@ -40,7 +39,6 @@ struct iommu_group {
 	struct kobject *devices_kobj;
 	struct list_head devices;
 	struct mutex mutex;
-	struct blocking_notifier_head notifier;
 	void *iommu_data;
 	void (*iommu_data_release)(void *iommu_data);
 	char *name;
@@ -632,7 +630,6 @@ struct iommu_group *iommu_group_alloc(void)
 	mutex_init(&group->mutex);
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
-	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -905,10 +902,6 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	if (ret)
 		goto err_put_group;
 
-	/* Notify any listeners about change to group. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
-
 	trace_add_device_to_group(group->id, dev);
 
 	dev_info(dev, "Adding to iommu group %d\n", group->id);
@@ -950,10 +943,6 @@ void iommu_group_remove_device(struct device *dev)
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	/* Pre-notify listeners that a device is being removed. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
-
 	mutex_lock(&group->mutex);
 	list_for_each_entry(tmp_device, &group->devices, list) {
 		if (tmp_device->dev == dev) {
@@ -1075,36 +1064,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_group_register_notifier - Register a notifier for group changes
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * This function allows iommu group users to track changes in a group.
- * See include/linux/iommu.h for actions sent via this notifier.  Caller
- * should hold a reference to the group throughout notifier registration.
- */
-int iommu_group_register_notifier(struct iommu_group *group,
-				  struct notifier_block *nb)
-{
-	return blocking_notifier_chain_register(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
-
-/**
- * iommu_group_unregister_notifier - Unregister a notifier
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * Unregister a previously registered group notifier block.
- */
-int iommu_group_unregister_notifier(struct iommu_group *group,
-				    struct notifier_block *nb)
-{
-	return blocking_notifier_chain_unregister(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
-
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
@@ -1650,14 +1609,8 @@ static int remove_iommu_group(struct device *dev, void *data)
 static int iommu_bus_notifier(struct notifier_block *nb,
 			      unsigned long action, void *data)
 {
-	unsigned long group_action = 0;
 	struct device *dev = data;
-	struct iommu_group *group;
 
-	/*
-	 * ADD/DEL call into iommu driver ops if provided, which may
-	 * result in ADD/DEL notifiers to group->notifier
-	 */
 	if (action == BUS_NOTIFY_ADD_DEVICE) {
 		int ret;
 
@@ -1668,34 +1621,6 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 		return NOTIFY_OK;
 	}
 
-	/*
-	 * Remaining BUS_NOTIFYs get filtered and republished to the
-	 * group, if anyone is listening
-	 */
-	group = iommu_group_get(dev);
-	if (!group)
-		return 0;
-
-	switch (action) {
-	case BUS_NOTIFY_BIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
-		break;
-	case BUS_NOTIFY_BOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
-		break;
-	}
-
-	if (group_action)
-		blocking_notifier_call_chain(&group->notifier,
-					     group_action, dev);
-
-	iommu_group_put(group);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma
  2022-02-28  0:50 ` [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma Lu Baolu
@ 2022-02-28 19:54   ` Bjorn Helgaas
  0 siblings, 0 replies; 27+ messages in thread
From: Bjorn Helgaas @ 2022-02-28 19:54 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj, kvm,
	rafael, David Airlie, linux-pci, Thierry Reding, Diana Craciun,
	Dmitry Osipenko, Will Deacon, Stuart Yoder, Jonathan Hunter,
	Chaitanya Kulkarni, Dan Williams, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Mon, Feb 28, 2022 at 08:50:50AM +0800, Lu Baolu wrote:
> The current VFIO implementation allows pci-stub driver to be bound to
> a PCI device with other devices in the same IOMMU group being assigned
> to userspace. The pci-stub driver has no dependencies on DMA or the
> IOVA mapping of the device, but it does prevent the user from having
> direct access to the device, which is useful in some circumstances.
> 
> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA. For compatibility with the VFIO
> usage, avoid this default behavior for the pci_stub. This allows the
> pci_stub still able to be used by the admin to block driver binding after
> applying the DMA ownership to VFIO.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/pci-stub.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..d1f4c1ce7bd1 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,7 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver_managed_dma = true,
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 06/11] PCI: portdrv: Set driver_managed_dma
  2022-02-28  0:50 ` [PATCH v7 06/11] PCI: portdrv: " Lu Baolu
@ 2022-02-28 19:56   ` Bjorn Helgaas
  2022-03-01  2:54     ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2022-02-28 19:56 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj, kvm,
	rafael, David Airlie, linux-pci, Thierry Reding, Diana Craciun,
	Dmitry Osipenko, Will Deacon, Stuart Yoder, Jonathan Hunter,
	Chaitanya Kulkarni, Dan Williams, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Mon, Feb 28, 2022 at 08:50:51AM +0800, Lu Baolu wrote:
> If a switch lacks ACS P2P Request Redirect, a device below the switch can
> bypass the IOMMU and DMA directly to other devices below the switch, so
> all the downstream devices must be in the same IOMMU group as the switch
> itself.
> 
> The existing VFIO framework allows the portdrv driver to be bound to the
> bridge while its downstream devices are assigned to user space. The
> pci_dma_configure() marks the IOMMU group as containing only devices
> with kernel drivers that manage DMA. Avoid this default behavior for the
> portdrv driver in order for compatibility with the current VFIO usage.

It would be nice to explicitly say here how we can look at portdrv
(and pci_stub) and conclude that ".driver_managed_dma = true" is safe.

Otherwise I won't know what kind of future change to portdrv might
make it unsafe.

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..6b2adb678c21 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> +	.driver_managed_dma = true,
> +
>  	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>  };
>  
> -- 
> 2.25.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices
  2022-02-28  0:50 ` [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices Lu Baolu
@ 2022-02-28 22:06   ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2022-02-28 22:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Mon, 28 Feb 2022 08:50:52 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Claim group dma ownership when an IOMMU group is set to a container,
> and release the dma ownership once the iommu group is unset from the
> container.
> 
> This change disallows some unsafe bridge drivers to bind to non-ACS
> bridges while devices under them are assigned to user space. This is an
> intentional enhancement and possibly breaks some existing
> configurations. The recommendation to such an affected user would be
> that the previously allowed host bridge driver was unsafe for this use
> case and to continue to enable assignment of devices within that group,
> the driver should be unbound from the bridge device or replaced with the
> pci-stub driver.
> 
> For any bridge driver, we consider it unsafe if it satisfies any of the
> following conditions:
> 
>   1) The bridge driver uses DMA. Calling pci_set_master() or calling any
>      kernel DMA API (dma_map_*() and etc.) is an indicate that the
>      driver is doing DMA.
> 
>   2) If the bridge driver uses MMIO, it should be tolerant to hostile
>      userspace also touching the same MMIO registers via P2P DMA
>      attacks.
> 
> If the bridge driver turns out to be a safe one, it could be used as
> before by setting the driver's .driver_managed_dma field, just like what
> we have done in the pcieport driver.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c     |  1 +
>  drivers/vfio/pci/vfio_pci.c           |  1 +
>  drivers/vfio/platform/vfio_amba.c     |  1 +
>  drivers/vfio/platform/vfio_platform.c |  1 +
>  drivers/vfio/vfio.c                   | 10 +++++++++-
>  5 files changed, 13 insertions(+), 1 deletion(-)

Acked-by: Alex Williamson <alex.williamson@redhat.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 08/11] vfio: Remove use of vfio_group_viable()
  2022-02-28  0:50 ` [PATCH v7 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
@ 2022-02-28 22:06   ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2022-02-28 22:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Mon, 28 Feb 2022 08:50:53 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> As DMA ownership is claimed for the iommu group when a VFIO group is
> added to a VFIO container, the VFIO group viability is guaranteed as long
> as group->container_users > 0. Remove those unnecessary group viability
> checks which are only hit when group->container_users is not zero.
> 
> The only remaining reference is in GROUP_GET_STATUS, which could be called
> at any time when group fd is valid. Here we just replace the
> vfio_group_viable() by directly calling IOMMU core to get viability status.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)

Acked-by: Alex Williamson <alex.williamson@redhat.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 09/11] vfio: Delete the unbound_list
  2022-02-28  0:50 ` [PATCH v7 09/11] vfio: Delete the unbound_list Lu Baolu
@ 2022-02-28 22:06   ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2022-02-28 22:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Mon, 28 Feb 2022 08:50:54 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> commit 60720a0fc646 ("vfio: Add device tracking during unbind") added the
> unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL
> relied on vfio_group_get_external_user() succeeding to return the
> vfio_group from a group file descriptor. The unbound list allowed
> vfio_group_get_external_user() to continue to succeed in edge cases.
> 
> However commit 5d6dee80a1e9 ("vfio: New external user group/file match")
> deleted the call to vfio_group_get_external_user() during
> KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used
> to directly match the file descriptor to the group pointer.
> 
> This in turn avoids the call down to vfio_dev_viable() during
> KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was
> trying to fix.
> 
> There are no other users of vfio_dev_viable() that care about the time
> after vfio_unregister_group_dev() returns, so simply delete the
> unbound_list entirely.
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/vfio/vfio.c | 74 ++-------------------------------------------
>  1 file changed, 2 insertions(+), 72 deletions(-)

Acked-by: Alex Williamson <alex.williamson@redhat.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 10/11] vfio: Remove iommu group notifier
  2022-02-28  0:50 ` [PATCH v7 10/11] vfio: Remove iommu group notifier Lu Baolu
@ 2022-02-28 22:06   ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2022-02-28 22:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Mon, 28 Feb 2022 08:50:55 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> The iommu core and driver core have been enhanced to avoid unsafe driver
> binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
> has been called. There's no need to register iommu group notifier. This
> removes the iommu group notifer which contains BUG_ON() and WARN().
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 147 --------------------------------------------
>  1 file changed, 147 deletions(-)

Acked-by: Alex Williamson <alex.williamson@redhat.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 06/11] PCI: portdrv: Set driver_managed_dma
  2022-02-28 19:56   ` Bjorn Helgaas
@ 2022-03-01  2:54     ` Lu Baolu
  0 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-03-01  2:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj, kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

Hi Bjorn,

On 3/1/22 3:56 AM, Bjorn Helgaas wrote:
> On Mon, Feb 28, 2022 at 08:50:51AM +0800, Lu Baolu wrote:
>> If a switch lacks ACS P2P Request Redirect, a device below the switch can
>> bypass the IOMMU and DMA directly to other devices below the switch, so
>> all the downstream devices must be in the same IOMMU group as the switch
>> itself.
>>
>> The existing VFIO framework allows the portdrv driver to be bound to the
>> bridge while its downstream devices are assigned to user space. The
>> pci_dma_configure() marks the IOMMU group as containing only devices
>> with kernel drivers that manage DMA. Avoid this default behavior for the
>> portdrv driver in order for compatibility with the current VFIO usage.
> 
> It would be nice to explicitly say here how we can look at portdrv
> (and pci_stub) and conclude that ".driver_managed_dma = true" is safe.
> 
> Otherwise I won't know what kind of future change to portdrv might
> make it unsafe.

Fair enough. We can add below words:

We achieve this by setting ".driver_managed_dma = true" in pci_driver
structure. It is safe because the portdrv driver meets below criteria:

- This driver doesn't use DMA, as you can't find any related calls like
   pci_set_master() or any kernel DMA API (dma_map_*() and etc.).
- It doesn't use MMIO as you can't find ioremap() or similar calls. It's
   tolerant to userspace possibly also touching the same MMIO registers
   via P2P DMA access.

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thank you!

Best regards,
baolu

> 
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..6b2adb678c21 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = {
>>   
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>> +	.driver_managed_dma = true,
>> +
>>   	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>>   };
>>   
>> -- 
>> 2.25.1
>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-02-28  0:50 ` [PATCH v7 01/11] iommu: Add DMA ownership management interfaces Lu Baolu
@ 2022-03-04 10:34   ` Eric Auger
  2022-03-04 10:43     ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2022-03-04 10:34 UTC (permalink / raw)
  To: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

Hi Lu,

On 2/28/22 1:50 AM, Lu Baolu wrote:
> Multiple devices may be placed in the same IOMMU group because they
> cannot be isolated from each other. These devices must either be
> entirely under kernel control or userspace control, never a mixture.
>
> This adds dma ownership management in iommu core and exposes several
> interfaces for the device drivers and the device userspace assignment
> framework (i.e. VFIO), so that any conflict between user and kernel
> controlled dma could be detected at the beginning.
>
> The device driver oriented interfaces are,
>
> 	int iommu_device_use_default_domain(struct device *dev);
> 	void iommu_device_unuse_default_domain(struct device *dev);
>
> By calling iommu_device_use_default_domain(), the device driver tells
> the iommu layer that the device dma is handled through the kernel DMA
> APIs. The iommu layer will manage the IOVA and use the default domain
> for DMA address translation.
>
> The device user-space assignment framework oriented interfaces are,
>
> 	int iommu_group_claim_dma_owner(struct iommu_group *group,
> 					void *owner);
> 	void iommu_group_release_dma_owner(struct iommu_group *group);
> 	bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>
> The device userspace assignment must be disallowed if the DMA owner
> claiming interface returns failure.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  include/linux/iommu.h |  31 +++++++++
>  drivers/iommu/iommu.c | 153 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9208eca4b0d1..77972ef978b5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -675,6 +675,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>  void iommu_sva_unbind_device(struct iommu_sva *handle);
>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>  
> +int iommu_device_use_default_domain(struct device *dev);
> +void iommu_device_unuse_default_domain(struct device *dev);
> +
> +int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
> +void iommu_group_release_dma_owner(struct iommu_group *group);
> +bool iommu_group_dma_owner_claimed(struct iommu_group *group);
> +
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -1031,6 +1038,30 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
>  {
>  	return NULL;
>  }
> +
> +static inline int iommu_device_use_default_domain(struct device *dev)
> +{
> +	return 0;
> +}
> +
> +static inline void iommu_device_unuse_default_domain(struct device *dev)
> +{
> +}
> +
> +static inline int
> +iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_group_release_dma_owner(struct iommu_group *group)
> +{
> +}
> +
> +static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> +{
> +	return false;
> +}
>  #endif /* CONFIG_IOMMU_API */
>  
>  /**
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2c45b85b9fc..eba8e8ccf19d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -48,6 +48,8 @@ struct iommu_group {
>  	struct iommu_domain *default_domain;
>  	struct iommu_domain *domain;
>  	struct list_head entry;
> +	unsigned int owner_cnt;
> +	void *owner;
>  };
>  
>  struct group_device {
> @@ -294,7 +296,11 @@ int iommu_probe_device(struct device *dev)
>  	mutex_lock(&group->mutex);
>  	iommu_alloc_default_domain(group, dev);
>  
> -	if (group->default_domain) {
> +	/*
> +	 * If device joined an existing group which has been claimed, don't
> +	 * attach the default domain.
> +	 */
> +	if (group->default_domain && !group->owner) {
>  		ret = __iommu_attach_device(group->default_domain, dev);
>  		if (ret) {
>  			mutex_unlock(&group->mutex);
> @@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
>  {
>  	int ret;
>  
> -	if (group->default_domain && group->domain != group->default_domain)
> +	if (group->domain && group->domain != group->default_domain)
>  		return -EBUSY;
>  
>  	ret = __iommu_group_for_each_dev(group, domain,
> @@ -2146,7 +2152,11 @@ static void __iommu_detach_group(struct iommu_domain *domain,
>  {
>  	int ret;
>  
> -	if (!group->default_domain) {
> +	/*
> +	 * If the group has been claimed already, do not re-attach the default
> +	 * domain.
> +	 */
> +	if (!group->default_domain || group->owner) {
>  		__iommu_group_for_each_dev(group, domain,
>  					   iommu_group_do_detach_device);
>  		group->domain = NULL;
> @@ -3095,3 +3105,140 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>  
>  	return ret;
>  }
> +
> +/**
> + * iommu_device_use_default_domain() - Device driver wants to handle device
> + *                                     DMA through the kernel DMA API.
> + * @dev: The device.
> + *
> + * The device driver about to bind @dev wants to do DMA through the kernel
> + * DMA API. Return 0 if it is allowed, otherwise an error.
> + */
> +int iommu_device_use_default_domain(struct device *dev)
> +{
> +	struct iommu_group *group = iommu_group_get(dev);
> +	int ret = 0;
> +
> +	if (!group)
> +		return 0;
I hit a WARN_ON() when unbinding an e1000e driver just after boot:

sudo modprobe -v vfio-pci
echo vfio-pci | sudo tee -a
/sys/bus/pci/devices/0004:01:00.0/driver_override
vfio-pci
echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind


[  390.042811] ------------[ cut here ]------------
[  390.046468] WARNING: CPU: 42 PID: 5589 at drivers/iommu/iommu.c:3123
iommu_device_unuse_default_domain+0x68/0x100
[  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill
sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c mlx5_core sg
mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform qcom_emac
mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
[  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
5.17.0-rc4-lu-v7-official+ #24
[  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS 0ACJA570
11/05/2018
[  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
[  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
[  390.150894] sp : ffff80000fbb3bc0
[  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
0000000000000000
[  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
ffff03c0c7cc5720
[  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
ffffb4e61df223f8
[  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
0000000000000000
[  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
0000000000000000
[  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
ffff03c0d519cd80
[  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
ffffb4e620b54f8c
[  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
00000000000000ff
[  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
0000000000000000
[  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
0000000000000000
[  390.225374] Call trace:
[  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
[  390.233187]  pci_dma_cleanup+0x38/0x44
[  390.236919]  __device_release_driver+0x1a8/0x260
[  390.241519]  device_driver_detach+0x50/0xd0
[  390.245686]  unbind_store+0xf8/0x120
[  390.249245]  drv_attr_store+0x30/0x44
[  390.252891]  sysfs_kf_write+0x50/0x60
[  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
[  390.260964]  new_sync_write+0xf0/0x18c
[  390.264696]  vfs_write+0x230/0x2d0
[  390.268082]  ksys_write+0x74/0x100
[  390.271467]  __arm64_sys_write+0x28/0x3c
[  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
[  390.280061]  el0_svc_common.constprop.0+0x160/0x164
[  390.284922]  do_el0_svc+0x34/0xcc
[  390.288221]  el0_svc+0x30/0x140
[  390.291346]  el0t_64_sync_handler+0xa4/0x130
[  390.295599]  el0t_64_sync+0x1a0/0x1a4
[  390.299245] ---[ end trace 0000000000000000 ]---


I put some traces in the code and I can see that iommu_device_use_default_domain() effectively is called on 0004:01:00.0 e1000e device on pci_dma_configure() but at that time the iommu group is NULL:
[   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure driver_managed_area=0
[   10.569431] e1000e 0004:01:00.0: **** iommu_device_use_default_domain ENTRY
[   10.569433] e1000e 0004:01:00.0: **** iommu_device_use_default_domain no group
[   10.569435] e1000e 0004:01:00.0: pci_dma_configure iommu_device_use_default_domain returned 0
[   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3

^^^the group is added after the 
iommu_device_use_default_domain() call
So the group->owner_cnt is not incremented as expected.

Thanks

Eric

> +
> +	mutex_lock(&group->mutex);
> +	if (group->owner_cnt) {
> +		if (group->domain != group->default_domain ||
> +		    group->owner) {
> +			ret = -EBUSY;
> +			goto unlock_out;
> +		}
> +	}
> +
> +	group->owner_cnt++;
> +
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +
> +/**
> + * iommu_device_unuse_default_domain() - Device driver stops handling device
> + *                                       DMA through the kernel DMA API.
> + * @dev: The device.
> + *
> + * The device driver doesn't want to do DMA through kernel DMA API anymore.
> + * It must be called after iommu_device_use_default_domain().
> + */
> +void iommu_device_unuse_default_domain(struct device *dev)
> +{
> +	struct iommu_group *group = iommu_group_get(dev);
> +
> +	if (!group)
> +		return;
> +
> +	mutex_lock(&group->mutex);
> +	if (!WARN_ON(!group->owner_cnt))
> +		group->owner_cnt--;
> +
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +}
> +
> +/**
> + * iommu_group_claim_dma_owner() - Set DMA ownership of a group
> + * @group: The group.
> + * @owner: Caller specified pointer. Used for exclusive ownership.
> + *
> + * This is to support backward compatibility for vfio which manages
> + * the dma ownership in iommu_group level. New invocations on this
> + * interface should be prohibited.
> + */
> +int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&group->mutex);
> +	if (group->owner_cnt) {
> +		ret = -EPERM;
> +		goto unlock_out;
> +	} else {
> +		if (group->domain && group->domain != group->default_domain) {
> +			ret = -EBUSY;
> +			goto unlock_out;
> +		}
> +
> +		group->owner = owner;
> +		if (group->domain)
> +			__iommu_detach_group(group->domain, group);
> +	}
> +
> +	group->owner_cnt++;
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
> +
> +/**
> + * iommu_group_release_dma_owner() - Release DMA ownership of a group
> + * @group: The group.
> + *
> + * Release the DMA ownership claimed by iommu_group_claim_dma_owner().
> + */
> +void iommu_group_release_dma_owner(struct iommu_group *group)
> +{
> +	mutex_lock(&group->mutex);
> +	if (WARN_ON(!group->owner_cnt || !group->owner))
> +		goto unlock_out;
> +
> +	group->owner_cnt = 0;
> +	/*
> +	 * The UNMANAGED domain should be detached before all USER
> +	 * owners have been released.
> +	 */
> +	if (!WARN_ON(group->domain) && group->default_domain)
> +		__iommu_attach_group(group->default_domain, group);
> +	group->owner = NULL;
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
> +
> +/**
> + * iommu_group_dma_owner_claimed() - Query group dma ownership status
> + * @group: The group.
> + *
> + * This provides status query on a given group. It is racy and only for
> + * non-binding status reporting.
> + */
> +bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> +{
> +	unsigned int user;
> +
> +	mutex_lock(&group->mutex);
> +	user = group->owner_cnt;
> +	mutex_unlock(&group->mutex);
> +
> +	return user;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-04 10:34   ` Eric Auger
@ 2022-03-04 10:43     ` Lu Baolu
  2022-03-04 12:22       ` Robin Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-03-04 10:43 UTC (permalink / raw)
  To: eric.auger, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj
  Cc: baolu.lu, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Eric,

On 2022/3/4 18:34, Eric Auger wrote:
> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
> 
> sudo modprobe -v vfio-pci
> echo vfio-pci | sudo tee -a
> /sys/bus/pci/devices/0004:01:00.0/driver_override
> vfio-pci
> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
> 
> 
> [  390.042811] ------------[ cut here ]------------
> [  390.046468] WARNING: CPU: 42 PID: 5589 at drivers/iommu/iommu.c:3123
> iommu_device_unuse_default_domain+0x68/0x100
> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill
> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c mlx5_core sg
> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform qcom_emac
> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
> 5.17.0-rc4-lu-v7-official+ #24
> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS 0ACJA570
> 11/05/2018
> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
> [  390.150894] sp : ffff80000fbb3bc0
> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
> 0000000000000000
> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
> ffff03c0c7cc5720
> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
> ffffb4e61df223f8
> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
> 0000000000000000
> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
> 0000000000000000
> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
> ffff03c0d519cd80
> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
> ffffb4e620b54f8c
> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
> 00000000000000ff
> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
> 0000000000000000
> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
> 0000000000000000
> [  390.225374] Call trace:
> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
> [  390.233187]  pci_dma_cleanup+0x38/0x44
> [  390.236919]  __device_release_driver+0x1a8/0x260
> [  390.241519]  device_driver_detach+0x50/0xd0
> [  390.245686]  unbind_store+0xf8/0x120
> [  390.249245]  drv_attr_store+0x30/0x44
> [  390.252891]  sysfs_kf_write+0x50/0x60
> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
> [  390.260964]  new_sync_write+0xf0/0x18c
> [  390.264696]  vfs_write+0x230/0x2d0
> [  390.268082]  ksys_write+0x74/0x100
> [  390.271467]  __arm64_sys_write+0x28/0x3c
> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
> [  390.284922]  do_el0_svc+0x34/0xcc
> [  390.288221]  el0_svc+0x30/0x140
> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
> [  390.299245] ---[ end trace 0000000000000000 ]---
> 
> 
> I put some traces in the code and I can see that iommu_device_use_default_domain() effectively is called on 0004:01:00.0 e1000e device on pci_dma_configure() but at that time the iommu group is NULL:
> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure driver_managed_area=0
> [   10.569431] e1000e 0004:01:00.0: **** iommu_device_use_default_domain ENTRY
> [   10.569433] e1000e 0004:01:00.0: **** iommu_device_use_default_domain no group
> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure iommu_device_use_default_domain returned 0
> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
> 
> ^^^the group is added after the
> iommu_device_use_default_domain() call
> So the group->owner_cnt is not incremented as expected.

Thank you for reporting this. Do you have any idea why the driver is
loaded before iommu_probe_device()?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-04 10:43     ` Lu Baolu
@ 2022-03-04 12:22       ` Robin Murphy
  2022-03-04 13:55         ` Eric Auger
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2022-03-04 12:22 UTC (permalink / raw)
  To: Lu Baolu, eric.auger, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder,
	Laurentiu Tudor, Thierry Reding, David Airlie, Daniel Vetter,
	Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm,
	linux-kernel

On 2022-03-04 10:43, Lu Baolu wrote:
> Hi Eric,
> 
> On 2022/3/4 18:34, Eric Auger wrote:
>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>
>> sudo modprobe -v vfio-pci
>> echo vfio-pci | sudo tee -a
>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>> vfio-pci
>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>
>>
>> [  390.042811] ------------[ cut here ]------------
>> [  390.046468] WARNING: CPU: 42 PID: 5589 at drivers/iommu/iommu.c:3123
>> iommu_device_unuse_default_domain+0x68/0x100
>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill
>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c mlx5_core sg
>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform qcom_emac
>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>> 5.17.0-rc4-lu-v7-official+ #24
>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS 0ACJA570
>> 11/05/2018
>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>> [  390.150894] sp : ffff80000fbb3bc0
>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>> 0000000000000000
>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>> ffff03c0c7cc5720
>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>> ffffb4e61df223f8
>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>> 0000000000000000
>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>> 0000000000000000
>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>> ffff03c0d519cd80
>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>> ffffb4e620b54f8c
>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>> 00000000000000ff
>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>> 0000000000000000
>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>> 0000000000000000
>> [  390.225374] Call trace:
>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>> [  390.236919]  __device_release_driver+0x1a8/0x260
>> [  390.241519]  device_driver_detach+0x50/0xd0
>> [  390.245686]  unbind_store+0xf8/0x120
>> [  390.249245]  drv_attr_store+0x30/0x44
>> [  390.252891]  sysfs_kf_write+0x50/0x60
>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>> [  390.260964]  new_sync_write+0xf0/0x18c
>> [  390.264696]  vfs_write+0x230/0x2d0
>> [  390.268082]  ksys_write+0x74/0x100
>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>> [  390.284922]  do_el0_svc+0x34/0xcc
>> [  390.288221]  el0_svc+0x30/0x140
>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>
>>
>> I put some traces in the code and I can see that 
>> iommu_device_use_default_domain() effectively is called on 
>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time the 
>> iommu group is NULL:
>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure 
>> driver_managed_area=0
>> [   10.569431] e1000e 0004:01:00.0: **** 
>> iommu_device_use_default_domain ENTRY
>> [   10.569433] e1000e 0004:01:00.0: **** 
>> iommu_device_use_default_domain no group
>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure 
>> iommu_device_use_default_domain returned 0
>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>
>> ^^^the group is added after the
>> iommu_device_use_default_domain() call
>> So the group->owner_cnt is not incremented as expected.
> 
> Thank you for reporting this. Do you have any idea why the driver is
> loaded before iommu_probe_device()?

Urgh, this is the horrible firmware-data-ordering thing again. The stuff 
I've been saying about having to rework the whole .dma_configure 
mechanism in the near future is to fix this properly.

The summary is that in patch #4, calling 
iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is 
currently a problem. As things stand, the IOMMU driver ignored the 
initial iommu_probe_device() call when the device was added, since at 
that point it had no fwspec yet. In this situation, 
{of,acpi}_iommu_configure() are retriggering iommu_probe_device() after 
the IOMMU driver has seen the firmware data via .of_xlate to learn that 
it it actually responsible for the given device.

Robin.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-04 12:22       ` Robin Murphy
@ 2022-03-04 13:55         ` Eric Auger
  2022-03-04 14:10           ` Robin Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2022-03-04 13:55 UTC (permalink / raw)
  To: Robin Murphy, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder,
	Laurentiu Tudor, Thierry Reding, David Airlie, Daniel Vetter,
	Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm,
	linux-kernel

Hi Robin,

On 3/4/22 1:22 PM, Robin Murphy wrote:
> On 2022-03-04 10:43, Lu Baolu wrote:
>> Hi Eric,
>>
>> On 2022/3/4 18:34, Eric Auger wrote:
>>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>>
>>> sudo modprobe -v vfio-pci
>>> echo vfio-pci | sudo tee -a
>>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>>> vfio-pci
>>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>>
>>>
>>> [  390.042811] ------------[ cut here ]------------
>>> [  390.046468] WARNING: CPU: 42 PID: 5589 at drivers/iommu/iommu.c:3123
>>> iommu_device_unuse_default_domain+0x68/0x100
>>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
>>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill
>>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c
>>> mlx5_core sg
>>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
>>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform
>>> qcom_emac
>>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>>> 5.17.0-rc4-lu-v7-official+ #24
>>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS
>>> 0ACJA570
>>> 11/05/2018
>>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>>> BTYPE=--)
>>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>>> [  390.150894] sp : ffff80000fbb3bc0
>>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>>> 0000000000000000
>>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>>> ffff03c0c7cc5720
>>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>>> ffffb4e61df223f8
>>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>>> 0000000000000000
>>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>>> 0000000000000000
>>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>>> ffff03c0d519cd80
>>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>>> ffffb4e620b54f8c
>>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>>> 00000000000000ff
>>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>>> 0000000000000000
>>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>>> 0000000000000000
>>> [  390.225374] Call trace:
>>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>>> [  390.236919]  __device_release_driver+0x1a8/0x260
>>> [  390.241519]  device_driver_detach+0x50/0xd0
>>> [  390.245686]  unbind_store+0xf8/0x120
>>> [  390.249245]  drv_attr_store+0x30/0x44
>>> [  390.252891]  sysfs_kf_write+0x50/0x60
>>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>>> [  390.260964]  new_sync_write+0xf0/0x18c
>>> [  390.264696]  vfs_write+0x230/0x2d0
>>> [  390.268082]  ksys_write+0x74/0x100
>>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>>> [  390.284922]  do_el0_svc+0x34/0xcc
>>> [  390.288221]  el0_svc+0x30/0x140
>>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>>
>>>
>>> I put some traces in the code and I can see that
>>> iommu_device_use_default_domain() effectively is called on
>>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time
>>> the iommu group is NULL:
>>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure
>>> driver_managed_area=0
>>> [   10.569431] e1000e 0004:01:00.0: ****
>>> iommu_device_use_default_domain ENTRY
>>> [   10.569433] e1000e 0004:01:00.0: ****
>>> iommu_device_use_default_domain no group
>>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure
>>> iommu_device_use_default_domain returned 0
>>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>>
>>> ^^^the group is added after the
>>> iommu_device_use_default_domain() call
>>> So the group->owner_cnt is not incremented as expected.
>>
>> Thank you for reporting this. Do you have any idea why the driver is
>> loaded before iommu_probe_device()?
>
> Urgh, this is the horrible firmware-data-ordering thing again. The
> stuff I've been saying about having to rework the whole .dma_configure
> mechanism in the near future is to fix this properly.
>
> The summary is that in patch #4, calling
> iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is
> currently a problem. As things stand, the IOMMU driver ignored the
> initial iommu_probe_device() call when the device was added, since at
> that point it had no fwspec yet. In this situation,
> {of,acpi}_iommu_configure() are retriggering iommu_probe_device()
> after the IOMMU driver has seen the firmware data via .of_xlate to
> learn that it it actually responsible for the given device.

thank you for providing the info. Hope this is something Lu can work around.

Thanks!

Eric
>
> Robin.
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-04 13:55         ` Eric Auger
@ 2022-03-04 14:10           ` Robin Murphy
  2022-03-07  3:27             ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2022-03-04 14:10 UTC (permalink / raw)
  To: eric.auger, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Chaitanya Kulkarni, kvm, Stuart Yoder, rafael, David Airlie,
	linux-pci, Cornelia Huck, linux-kernel, Jonathan Hunter, iommu,
	Thierry Reding, Jacob jun Pan, Daniel Vetter, Diana Craciun,
	Dan Williams, Li Yang, Will Deacon, Dmitry Osipenko

On 2022-03-04 13:55, Eric Auger wrote:
> Hi Robin,
> 
> On 3/4/22 1:22 PM, Robin Murphy wrote:
>> On 2022-03-04 10:43, Lu Baolu wrote:
>>> Hi Eric,
>>>
>>> On 2022/3/4 18:34, Eric Auger wrote:
>>>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>>>
>>>> sudo modprobe -v vfio-pci
>>>> echo vfio-pci | sudo tee -a
>>>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>>>> vfio-pci
>>>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>>>
>>>>
>>>> [  390.042811] ------------[ cut here ]------------
>>>> [  390.046468] WARNING: CPU: 42 PID: 5589 at drivers/iommu/iommu.c:3123
>>>> iommu_device_unuse_default_domain+0x68/0x100
>>>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>>>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
>>>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill
>>>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>>>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c
>>>> mlx5_core sg
>>>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
>>>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform
>>>> qcom_emac
>>>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>>>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>>>> 5.17.0-rc4-lu-v7-official+ #24
>>>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>>>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS
>>>> 0ACJA570
>>>> 11/05/2018
>>>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>>>> BTYPE=--)
>>>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>>>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>>>> [  390.150894] sp : ffff80000fbb3bc0
>>>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>>>> 0000000000000000
>>>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>>>> ffff03c0c7cc5720
>>>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>>>> ffffb4e61df223f8
>>>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>>>> 0000000000000000
>>>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>>>> 0000000000000000
>>>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>>>> ffff03c0d519cd80
>>>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>>>> ffffb4e620b54f8c
>>>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>>>> 00000000000000ff
>>>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>>>> 0000000000000000
>>>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>>>> 0000000000000000
>>>> [  390.225374] Call trace:
>>>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>>>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>>>> [  390.236919]  __device_release_driver+0x1a8/0x260
>>>> [  390.241519]  device_driver_detach+0x50/0xd0
>>>> [  390.245686]  unbind_store+0xf8/0x120
>>>> [  390.249245]  drv_attr_store+0x30/0x44
>>>> [  390.252891]  sysfs_kf_write+0x50/0x60
>>>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>>>> [  390.260964]  new_sync_write+0xf0/0x18c
>>>> [  390.264696]  vfs_write+0x230/0x2d0
>>>> [  390.268082]  ksys_write+0x74/0x100
>>>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>>>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>>>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>>>> [  390.284922]  do_el0_svc+0x34/0xcc
>>>> [  390.288221]  el0_svc+0x30/0x140
>>>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>>>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>>>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>>>
>>>>
>>>> I put some traces in the code and I can see that
>>>> iommu_device_use_default_domain() effectively is called on
>>>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time
>>>> the iommu group is NULL:
>>>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure
>>>> driver_managed_area=0
>>>> [   10.569431] e1000e 0004:01:00.0: ****
>>>> iommu_device_use_default_domain ENTRY
>>>> [   10.569433] e1000e 0004:01:00.0: ****
>>>> iommu_device_use_default_domain no group
>>>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure
>>>> iommu_device_use_default_domain returned 0
>>>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>>>
>>>> ^^^the group is added after the
>>>> iommu_device_use_default_domain() call
>>>> So the group->owner_cnt is not incremented as expected.
>>>
>>> Thank you for reporting this. Do you have any idea why the driver is
>>> loaded before iommu_probe_device()?
>>
>> Urgh, this is the horrible firmware-data-ordering thing again. The
>> stuff I've been saying about having to rework the whole .dma_configure
>> mechanism in the near future is to fix this properly.
>>
>> The summary is that in patch #4, calling
>> iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is
>> currently a problem. As things stand, the IOMMU driver ignored the
>> initial iommu_probe_device() call when the device was added, since at
>> that point it had no fwspec yet. In this situation,
>> {of,acpi}_iommu_configure() are retriggering iommu_probe_device()
>> after the IOMMU driver has seen the firmware data via .of_xlate to
>> learn that it it actually responsible for the given device.
> 
> thank you for providing the info. Hope this is something Lu can work around.

Hopefully it's just a case of flipping the calls around, so that 
iommu_use_default_domain() goes at the end, and calls 
arch_teardown_dma_ops() if it fails. From a quick skim I *think* that 
should still work out to the desired behaviour (or at least close enough 
that we can move forward without a circular dependency between fixes...)

Robin.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-04 14:10           ` Robin Murphy
@ 2022-03-07  3:27             ` Lu Baolu
  2022-03-07 12:42               ` Eric Auger
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2022-03-07  3:27 UTC (permalink / raw)
  To: Robin Murphy, eric.auger, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: baolu.lu, Chaitanya Kulkarni, kvm, Stuart Yoder, rafael,
	David Airlie, linux-pci, Cornelia Huck, linux-kernel,
	Jonathan Hunter, iommu, Thierry Reding, Jacob jun Pan,
	Daniel Vetter, Diana Craciun, Dan Williams, Li Yang, Will Deacon,
	Dmitry Osipenko

Hi Robin,

On 3/4/22 10:10 PM, Robin Murphy wrote:
> On 2022-03-04 13:55, Eric Auger wrote:
>> Hi Robin,
>>
>> On 3/4/22 1:22 PM, Robin Murphy wrote:
>>> On 2022-03-04 10:43, Lu Baolu wrote:
>>>> Hi Eric,
>>>>
>>>> On 2022/3/4 18:34, Eric Auger wrote:
>>>>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>>>>
>>>>> sudo modprobe -v vfio-pci
>>>>> echo vfio-pci | sudo tee -a
>>>>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>>>>> vfio-pci
>>>>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>>>>
>>>>>
>>>>> [  390.042811] ------------[ cut here ]------------
>>>>> [  390.046468] WARNING: CPU: 42 PID: 5589 at 
>>>>> drivers/iommu/iommu.c:3123
>>>>> iommu_device_unuse_default_domain+0x68/0x100
>>>>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>>>>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack 
>>>>> ipt_REJECT
>>>>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc 
>>>>> rfkill
>>>>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>>>>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c
>>>>> mlx5_core sg
>>>>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt
>>>>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform
>>>>> qcom_emac
>>>>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>>>>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>>>>> 5.17.0-rc4-lu-v7-official+ #24
>>>>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>>>>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS
>>>>> 0ACJA570
>>>>> 11/05/2018
>>>>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>>>>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>>>>> [  390.150894] sp : ffff80000fbb3bc0
>>>>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>>>>> 0000000000000000
>>>>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>>>>> ffff03c0c7cc5720
>>>>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>>>>> ffffb4e61df223f8
>>>>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>>>>> 0000000000000000
>>>>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>>>>> 0000000000000000
>>>>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>>>>> ffff03c0d519cd80
>>>>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>>>>> ffffb4e620b54f8c
>>>>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>>>>> 00000000000000ff
>>>>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>>>>> 0000000000000000
>>>>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>>>>> 0000000000000000
>>>>> [  390.225374] Call trace:
>>>>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>>>>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>>>>> [  390.236919]  __device_release_driver+0x1a8/0x260
>>>>> [  390.241519]  device_driver_detach+0x50/0xd0
>>>>> [  390.245686]  unbind_store+0xf8/0x120
>>>>> [  390.249245]  drv_attr_store+0x30/0x44
>>>>> [  390.252891]  sysfs_kf_write+0x50/0x60
>>>>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>>>>> [  390.260964]  new_sync_write+0xf0/0x18c
>>>>> [  390.264696]  vfs_write+0x230/0x2d0
>>>>> [  390.268082]  ksys_write+0x74/0x100
>>>>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>>>>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>>>>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>>>>> [  390.284922]  do_el0_svc+0x34/0xcc
>>>>> [  390.288221]  el0_svc+0x30/0x140
>>>>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>>>>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>>>>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>>>>
>>>>>
>>>>> I put some traces in the code and I can see that
>>>>> iommu_device_use_default_domain() effectively is called on
>>>>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time
>>>>> the iommu group is NULL:
>>>>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure
>>>>> driver_managed_area=0
>>>>> [   10.569431] e1000e 0004:01:00.0: ****
>>>>> iommu_device_use_default_domain ENTRY
>>>>> [   10.569433] e1000e 0004:01:00.0: ****
>>>>> iommu_device_use_default_domain no group
>>>>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure
>>>>> iommu_device_use_default_domain returned 0
>>>>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>>>>
>>>>> ^^^the group is added after the
>>>>> iommu_device_use_default_domain() call
>>>>> So the group->owner_cnt is not incremented as expected.
>>>>
>>>> Thank you for reporting this. Do you have any idea why the driver is
>>>> loaded before iommu_probe_device()?
>>>
>>> Urgh, this is the horrible firmware-data-ordering thing again. The
>>> stuff I've been saying about having to rework the whole .dma_configure
>>> mechanism in the near future is to fix this properly.
>>>
>>> The summary is that in patch #4, calling
>>> iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is
>>> currently a problem. As things stand, the IOMMU driver ignored the
>>> initial iommu_probe_device() call when the device was added, since at
>>> that point it had no fwspec yet. In this situation,
>>> {of,acpi}_iommu_configure() are retriggering iommu_probe_device()
>>> after the IOMMU driver has seen the firmware data via .of_xlate to
>>> learn that it it actually responsible for the given device.
>>
>> thank you for providing the info. Hope this is something Lu can work 
>> around.
> 
> Hopefully it's just a case of flipping the calls around, so that 
> iommu_use_default_domain() goes at the end, and calls 
> arch_teardown_dma_ops() if it fails. From a quick skim I *think* that 
> should still work out to the desired behaviour (or at least close enough 
> that we can move forward without a circular dependency between fixes...)

This is a reasonable solution to me. Thank you for the information and
suggestion.

Eric, I have updated the patch #4 and uploaded a new version here:

https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v8

Can you please give it a try?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-07  3:27             ` Lu Baolu
@ 2022-03-07 12:42               ` Eric Auger
  2022-03-08  0:31                 ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2022-03-07 12:42 UTC (permalink / raw)
  To: Lu Baolu, Robin Murphy, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Chaitanya Kulkarni, kvm, Stuart Yoder, rafael, David Airlie,
	linux-pci, Cornelia Huck, linux-kernel, Jonathan Hunter, iommu,
	Thierry Reding, Jacob jun Pan, Daniel Vetter, Diana Craciun,
	Dan Williams, Li Yang, Will Deacon, Dmitry Osipenko

Hi Lu,

On 3/7/22 4:27 AM, Lu Baolu wrote:
> Hi Robin,
>
> On 3/4/22 10:10 PM, Robin Murphy wrote:
>> On 2022-03-04 13:55, Eric Auger wrote:
>>> Hi Robin,
>>>
>>> On 3/4/22 1:22 PM, Robin Murphy wrote:
>>>> On 2022-03-04 10:43, Lu Baolu wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 2022/3/4 18:34, Eric Auger wrote:
>>>>>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>>>>>
>>>>>> sudo modprobe -v vfio-pci
>>>>>> echo vfio-pci | sudo tee -a
>>>>>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>>>>>> vfio-pci
>>>>>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>>>>>
>>>>>>
>>>>>> [  390.042811] ------------[ cut here ]------------
>>>>>> [  390.046468] WARNING: CPU: 42 PID: 5589 at
>>>>>> drivers/iommu/iommu.c:3123
>>>>>> iommu_device_unuse_default_domain+0x68/0x100
>>>>>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>>>>>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack
>>>>>> ipt_REJECT
>>>>>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>>>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc
>>>>>> rfkill
>>>>>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>>>>>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c
>>>>>> mlx5_core sg
>>>>>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce
>>>>>> sbsa_gwdt
>>>>>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform
>>>>>> qcom_emac
>>>>>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>>>>>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>>>>>> 5.17.0-rc4-lu-v7-official+ #24
>>>>>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>>>>>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS
>>>>>> 0ACJA570
>>>>>> 11/05/2018
>>>>>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>>>>>> BTYPE=--)
>>>>>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>>>>>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>>>>>> [  390.150894] sp : ffff80000fbb3bc0
>>>>>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>>>>>> 0000000000000000
>>>>>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>>>>>> ffff03c0c7cc5720
>>>>>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>>>>>> ffffb4e61df223f8
>>>>>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>>>>>> 0000000000000000
>>>>>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>>>>>> 0000000000000000
>>>>>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>>>>>> ffff03c0d519cd80
>>>>>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>>>>>> ffffb4e620b54f8c
>>>>>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>>>>>> 00000000000000ff
>>>>>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>>>>>> 0000000000000000
>>>>>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>>>>>> 0000000000000000
>>>>>> [  390.225374] Call trace:
>>>>>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>>>>>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>>>>>> [  390.236919]  __device_release_driver+0x1a8/0x260
>>>>>> [  390.241519]  device_driver_detach+0x50/0xd0
>>>>>> [  390.245686]  unbind_store+0xf8/0x120
>>>>>> [  390.249245]  drv_attr_store+0x30/0x44
>>>>>> [  390.252891]  sysfs_kf_write+0x50/0x60
>>>>>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>>>>>> [  390.260964]  new_sync_write+0xf0/0x18c
>>>>>> [  390.264696]  vfs_write+0x230/0x2d0
>>>>>> [  390.268082]  ksys_write+0x74/0x100
>>>>>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>>>>>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>>>>>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>>>>>> [  390.284922]  do_el0_svc+0x34/0xcc
>>>>>> [  390.288221]  el0_svc+0x30/0x140
>>>>>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>>>>>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>>>>>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>>>>>
>>>>>>
>>>>>> I put some traces in the code and I can see that
>>>>>> iommu_device_use_default_domain() effectively is called on
>>>>>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time
>>>>>> the iommu group is NULL:
>>>>>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure
>>>>>> driver_managed_area=0
>>>>>> [   10.569431] e1000e 0004:01:00.0: ****
>>>>>> iommu_device_use_default_domain ENTRY
>>>>>> [   10.569433] e1000e 0004:01:00.0: ****
>>>>>> iommu_device_use_default_domain no group
>>>>>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure
>>>>>> iommu_device_use_default_domain returned 0
>>>>>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>>>>>
>>>>>> ^^^the group is added after the
>>>>>> iommu_device_use_default_domain() call
>>>>>> So the group->owner_cnt is not incremented as expected.
>>>>>
>>>>> Thank you for reporting this. Do you have any idea why the driver is
>>>>> loaded before iommu_probe_device()?
>>>>
>>>> Urgh, this is the horrible firmware-data-ordering thing again. The
>>>> stuff I've been saying about having to rework the whole .dma_configure
>>>> mechanism in the near future is to fix this properly.
>>>>
>>>> The summary is that in patch #4, calling
>>>> iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is
>>>> currently a problem. As things stand, the IOMMU driver ignored the
>>>> initial iommu_probe_device() call when the device was added, since at
>>>> that point it had no fwspec yet. In this situation,
>>>> {of,acpi}_iommu_configure() are retriggering iommu_probe_device()
>>>> after the IOMMU driver has seen the firmware data via .of_xlate to
>>>> learn that it it actually responsible for the given device.
>>>
>>> thank you for providing the info. Hope this is something Lu can work
>>> around.
>>
>> Hopefully it's just a case of flipping the calls around, so that
>> iommu_use_default_domain() goes at the end, and calls
>> arch_teardown_dma_ops() if it fails. From a quick skim I *think* that
>> should still work out to the desired behaviour (or at least close
>> enough that we can move forward without a circular dependency between
>> fixes...)
>
> This is a reasonable solution to me. Thank you for the information and
> suggestion.
>
> Eric, I have updated the patch #4 and uploaded a new version here:
>
> https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v8

with v8 I do not hit the warning anymore and the owner accounting seems
to work as expected.

Thanks

Eric
>
> Can you please give it a try?
>
> Best regards,
> baolu
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/11] iommu: Add DMA ownership management interfaces
  2022-03-07 12:42               ` Eric Auger
@ 2022-03-08  0:31                 ` Lu Baolu
  0 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2022-03-08  0:31 UTC (permalink / raw)
  To: eric.auger, Robin Murphy, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: baolu.lu, Chaitanya Kulkarni, kvm, Stuart Yoder, rafael,
	David Airlie, linux-pci, Cornelia Huck, linux-kernel,
	Jonathan Hunter, iommu, Thierry Reding, Jacob jun Pan,
	Daniel Vetter, Diana Craciun, Dan Williams, Li Yang, Will Deacon,
	Dmitry Osipenko

On 2022/3/7 20:42, Eric Auger wrote:
> Hi Lu,
> 
> On 3/7/22 4:27 AM, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 3/4/22 10:10 PM, Robin Murphy wrote:
>>> On 2022-03-04 13:55, Eric Auger wrote:
>>>> Hi Robin,
>>>>
>>>> On 3/4/22 1:22 PM, Robin Murphy wrote:
>>>>> On 2022-03-04 10:43, Lu Baolu wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On 2022/3/4 18:34, Eric Auger wrote:
>>>>>>> I hit a WARN_ON() when unbinding an e1000e driver just after boot:
>>>>>>>
>>>>>>> sudo modprobe -v vfio-pci
>>>>>>> echo vfio-pci | sudo tee -a
>>>>>>> /sys/bus/pci/devices/0004:01:00.0/driver_override
>>>>>>> vfio-pci
>>>>>>> echo 0004:01:00.0 | sudo tee -a  /sys/bus/pci/drivers/e1000e/unbind
>>>>>>>
>>>>>>>
>>>>>>> [  390.042811] ------------[ cut here ]------------
>>>>>>> [  390.046468] WARNING: CPU: 42 PID: 5589 at
>>>>>>> drivers/iommu/iommu.c:3123
>>>>>>> iommu_device_unuse_default_domain+0x68/0x100
>>>>>>> [  390.056710] Modules linked in: vfio_pci vfio_pci_core vfio_virqfd
>>>>>>> vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack
>>>>>>> ipt_REJECT
>>>>>>> nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
>>>>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc
>>>>>>> rfkill
>>>>>>> sunrpc vfat fat mlx5_ib ib_uverbs ib_core acpi_ipmi ipmi_ssif
>>>>>>> ipmi_devintf ipmi_msghandler cppc_cpufreq drm xfs libcrc32c
>>>>>>> mlx5_core sg
>>>>>>> mlxfw crct10dif_ce tls ghash_ce sha2_ce sha256_arm64 sha1_ce
>>>>>>> sbsa_gwdt
>>>>>>> e1000e psample sdhci_acpi ahci_platform sdhci libahci_platform
>>>>>>> qcom_emac
>>>>>>> mmc_core hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod fuse
>>>>>>> [  390.110618] CPU: 42 PID: 5589 Comm: tee Kdump: loaded Not tainted
>>>>>>> 5.17.0-rc4-lu-v7-official+ #24
>>>>>>> [  390.119384] Hardware name: WIWYNN QDF2400 Reference Evaluation
>>>>>>> Platform CV90-LA115-P120/QDF2400 Customer Reference Board, BIOS
>>>>>>> 0ACJA570
>>>>>>> 11/05/2018
>>>>>>> [  390.132492] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
>>>>>>> BTYPE=--)
>>>>>>> [  390.139436] pc : iommu_device_unuse_default_domain+0x68/0x100
>>>>>>> [  390.145165] lr : iommu_device_unuse_default_domain+0x38/0x100
>>>>>>> [  390.150894] sp : ffff80000fbb3bc0
>>>>>>> [  390.154193] x29: ffff80000fbb3bc0 x28: ffff03c0cf6b2400 x27:
>>>>>>> 0000000000000000
>>>>>>> [  390.161311] x26: 0000000000000000 x25: 0000000000000000 x24:
>>>>>>> ffff03c0c7cc5720
>>>>>>> [  390.168429] x23: ffff03c0c2b9d150 x22: ffffb4e61df223f8 x21:
>>>>>>> ffffb4e61df223f8
>>>>>>> [  390.175547] x20: ffff03c7c03c3758 x19: ffff03c7c03c3700 x18:
>>>>>>> 0000000000000000
>>>>>>> [  390.182665] x17: 0000000000000000 x16: 0000000000000000 x15:
>>>>>>> 0000000000000000
>>>>>>> [  390.189783] x14: 0000000000000000 x13: 0000000000000030 x12:
>>>>>>> ffff03c0d519cd80
>>>>>>> [  390.196901] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000dc0 x9 :
>>>>>>> ffffb4e620b54f8c
>>>>>>> [  390.204019] x8 : ffff03c0cf6b3220 x7 : ffff4ef132bba000 x6 :
>>>>>>> 00000000000000ff
>>>>>>> [  390.211137] x5 : ffff03c0c2b9f108 x4 : ffff03c0d51f6438 x3 :
>>>>>>> 0000000000000000
>>>>>>> [  390.218255] x2 : ffff03c0cf6b2400 x1 : 0000000000000000 x0 :
>>>>>>> 0000000000000000
>>>>>>> [  390.225374] Call trace:
>>>>>>> [  390.227804]  iommu_device_unuse_default_domain+0x68/0x100
>>>>>>> [  390.233187]  pci_dma_cleanup+0x38/0x44
>>>>>>> [  390.236919]  __device_release_driver+0x1a8/0x260
>>>>>>> [  390.241519]  device_driver_detach+0x50/0xd0
>>>>>>> [  390.245686]  unbind_store+0xf8/0x120
>>>>>>> [  390.249245]  drv_attr_store+0x30/0x44
>>>>>>> [  390.252891]  sysfs_kf_write+0x50/0x60
>>>>>>> [  390.256537]  kernfs_fop_write_iter+0x134/0x1cc
>>>>>>> [  390.260964]  new_sync_write+0xf0/0x18c
>>>>>>> [  390.264696]  vfs_write+0x230/0x2d0
>>>>>>> [  390.268082]  ksys_write+0x74/0x100
>>>>>>> [  390.271467]  __arm64_sys_write+0x28/0x3c
>>>>>>> [  390.275373]  invoke_syscall.constprop.0+0x58/0xf0
>>>>>>> [  390.280061]  el0_svc_common.constprop.0+0x160/0x164
>>>>>>> [  390.284922]  do_el0_svc+0x34/0xcc
>>>>>>> [  390.288221]  el0_svc+0x30/0x140
>>>>>>> [  390.291346]  el0t_64_sync_handler+0xa4/0x130
>>>>>>> [  390.295599]  el0t_64_sync+0x1a0/0x1a4
>>>>>>> [  390.299245] ---[ end trace 0000000000000000 ]---
>>>>>>>
>>>>>>>
>>>>>>> I put some traces in the code and I can see that
>>>>>>> iommu_device_use_default_domain() effectively is called on
>>>>>>> 0004:01:00.0 e1000e device on pci_dma_configure() but at that time
>>>>>>> the iommu group is NULL:
>>>>>>> [   10.569427] e1000e 0004:01:00.0: ------ ENTRY pci_dma_configure
>>>>>>> driver_managed_area=0
>>>>>>> [   10.569431] e1000e 0004:01:00.0: ****
>>>>>>> iommu_device_use_default_domain ENTRY
>>>>>>> [   10.569433] e1000e 0004:01:00.0: ****
>>>>>>> iommu_device_use_default_domain no group
>>>>>>> [   10.569435] e1000e 0004:01:00.0: pci_dma_configure
>>>>>>> iommu_device_use_default_domain returned 0
>>>>>>> [   10.569492] e1000e 0004:01:00.0: Adding to iommu group 3
>>>>>>>
>>>>>>> ^^^the group is added after the
>>>>>>> iommu_device_use_default_domain() call
>>>>>>> So the group->owner_cnt is not incremented as expected.
>>>>>>
>>>>>> Thank you for reporting this. Do you have any idea why the driver is
>>>>>> loaded before iommu_probe_device()?
>>>>>
>>>>> Urgh, this is the horrible firmware-data-ordering thing again. The
>>>>> stuff I've been saying about having to rework the whole .dma_configure
>>>>> mechanism in the near future is to fix this properly.
>>>>>
>>>>> The summary is that in patch #4, calling
>>>>> iommu_device_use_default_domain() *before* {of,acpi}_dma_configure is
>>>>> currently a problem. As things stand, the IOMMU driver ignored the
>>>>> initial iommu_probe_device() call when the device was added, since at
>>>>> that point it had no fwspec yet. In this situation,
>>>>> {of,acpi}_iommu_configure() are retriggering iommu_probe_device()
>>>>> after the IOMMU driver has seen the firmware data via .of_xlate to
>>>>> learn that it it actually responsible for the given device.
>>>>
>>>> thank you for providing the info. Hope this is something Lu can work
>>>> around.
>>>
>>> Hopefully it's just a case of flipping the calls around, so that
>>> iommu_use_default_domain() goes at the end, and calls
>>> arch_teardown_dma_ops() if it fails. From a quick skim I *think* that
>>> should still work out to the desired behaviour (or at least close
>>> enough that we can move forward without a circular dependency between
>>> fixes...)
>>
>> This is a reasonable solution to me. Thank you for the information and
>> suggestion.
>>
>> Eric, I have updated the patch #4 and uploaded a new version here:
>>
>> https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v8
> 
> with v8 I do not hit the warning anymore and the owner accounting seems
> to work as expected.

Thank you, Eric! I will post the v8 soon.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-03-08  0:31 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28  0:50 [PATCH v7 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2022-02-28  0:50 ` [PATCH v7 01/11] iommu: Add DMA ownership management interfaces Lu Baolu
2022-03-04 10:34   ` Eric Auger
2022-03-04 10:43     ` Lu Baolu
2022-03-04 12:22       ` Robin Murphy
2022-03-04 13:55         ` Eric Auger
2022-03-04 14:10           ` Robin Murphy
2022-03-07  3:27             ` Lu Baolu
2022-03-07 12:42               ` Eric Auger
2022-03-08  0:31                 ` Lu Baolu
2022-02-28  0:50 ` [PATCH v7 02/11] driver core: Add dma_cleanup callback in bus_type Lu Baolu
2022-02-28  0:50 ` [PATCH v7 03/11] amba: Stop sharing platform_dma_configure() Lu Baolu
2022-02-28  0:50 ` [PATCH v7 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management Lu Baolu
2022-02-28  0:50 ` [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma Lu Baolu
2022-02-28 19:54   ` Bjorn Helgaas
2022-02-28  0:50 ` [PATCH v7 06/11] PCI: portdrv: " Lu Baolu
2022-02-28 19:56   ` Bjorn Helgaas
2022-03-01  2:54     ` Lu Baolu
2022-02-28  0:50 ` [PATCH v7 07/11] vfio: Set DMA ownership for VFIO devices Lu Baolu
2022-02-28 22:06   ` Alex Williamson
2022-02-28  0:50 ` [PATCH v7 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
2022-02-28 22:06   ` Alex Williamson
2022-02-28  0:50 ` [PATCH v7 09/11] vfio: Delete the unbound_list Lu Baolu
2022-02-28 22:06   ` Alex Williamson
2022-02-28  0:50 ` [PATCH v7 10/11] vfio: Remove iommu group notifier Lu Baolu
2022-02-28 22:06   ` Alex Williamson
2022-02-28  0:50 ` [PATCH v7 11/11] iommu: Remove iommu group changes notifier Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).