All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier()
@ 2021-12-17  6:36 ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Hi folks,

The iommu group is the minimal isolation boundary for DMA. Devices in
a group can access each other's MMIO registers via peer to peer DMA
and also need share the same I/O address space.

Once the I/O address space is assigned to user control it is no longer
available to the dma_map* API, which effectively makes the DMA API
non-working.

Second, userspace can use DMA initiated by a device that it controls
to access the MMIO spaces of other devices in the group. This allows
userspace to indirectly attack any kernel owned device and it's driver.

Therefore groups must either be entirely under kernel control or
userspace control, never a mixture. Unfortunately some systems have
problems with the granularity of groups and there are a couple of
important exceptions:

 - pci_stub allows the admin to block driver binding on a device and
   make it permanently shared with userspace. Since PCI stub does not
   do DMA it is safe, however the admin must understand that using
   pci_stub allows userspace to attack whatever device it was bound
   it.

 - PCI bridges are sometimes included in groups. Typically PCI bridges
   do not use DMA, and generally do not have MMIO regions.

Generally any device that does not have any MMIO registers is a
possible candidate for an exception.

Currently vfio adopts a workaround to detect violations of the above
restrictions by monitoring the driver core BOUND event, and hardwiring
the above exceptions. Since there is no way for vfio to reject driver
binding at this point, BUG_ON() is triggered if a violation is
captured (kernel driver BOUND event on a group which already has some
devices assigned to userspace). Aside from the bad user experience
this opens a way for root userspace to crash the kernel, even in high
integrity configurations, by manipulating the module binding and
triggering the BUG_ON.

This series solves this problem by making the user/kernel ownership a
core concept at the IOMMU layer. The driver core enforces kernel
ownership while drivers are bound and violations now result in a error
codes during probe, not BUG_ON failures.

Patch partitions:
  [PATCH 1-4]: Detect DMA ownership conflicts during driver binding;
  [PATCH 5-8]: Add security context management for assigned devices;
  [PATCH 9-13]: Various cleanups.

This is also part one of three initial series for IOMMUFD:
 * Move IOMMU Group security into the iommu layer
 - Generic IOMMUFD implementation
 - VFIO ability to consume IOMMUFD

Change log:
v1: initial post
  - https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/

v2:
  - https://lore.kernel.org/linux-iommu/20211128025051.355578-1-baolu.lu@linux.intel.com/

  - Move kernel dma ownership auto-claiming from driver core to bus
    callback. [Greg/Christoph/Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#m153706912b770682cb12e3c28f57e171aa1f9d0c

  - Code and interface refactoring for iommu_set/release_dma_owner()
    interfaces. [Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Add new iommu_attach/detach_device_shared() interfaces for
    multiple devices group. [Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Use iommu_attach/detach_device_shared() in drm/tegra drivers.

  - Refactoring and description refinement.

v3:
  - https://lore.kernel.org/linux-iommu/20211206015903.88687-1-baolu.lu@linux.intel.com/

  - Rename bus_type::dma_unconfigure to bus_type::dma_cleanup. [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m6711e041e47cb0cbe3964fad0a3466f5ae4b3b9b

  - Avoid _platform_dma_configure for platform_bus_type::dma_configure.
    [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m43fc46286611aa56a5c0eeaad99d539e5519f3f6

  - Patch "0012-iommu-Add-iommu_at-de-tach_device_shared-for-mult.patch"
    and "0018-drm-tegra-Use-the-iommu-dma_owner-mechanism.patch" have
    been tested by Dmitry Osipenko <digetx@gmail.com>.

v4:
  - Remove unnecessary tegra->domain chech in the tegra patch. (Jason)
  - Remove DMA_OWNER_NONE. (Joerg)
  - Change refcount to unsigned int. (Christoph)
  - Move mutex lock into group set_dma_owner functions. (Christoph)
  - Add kernel doc for iommu_attach/detach_domain_shared(). (Christoph)
  - Move dma auto-claim into driver core. (Jason/Christoph)

This is based on next branch of linux-iommu tree:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
and also available on github:
https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v4

Merry Christmas to you all!

Best regards,
baolu

Jason Gunthorpe (2):
  vfio: Delete the unbound_list
  drm/tegra: Use the iommu dma_owner mechanism

Lu Baolu (11):
  iommu: Add device dma ownership set/release interfaces
  driver core: Set DMA ownership during driver bind/unbind
  PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  iommu: Add security context management for assigned devices
  iommu: Expose group variants of dma ownership interfaces
  iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  vfio: Set DMA USER ownership for VFIO devices
  vfio: Remove use of vfio_group_viable()
  vfio: Remove iommu group notifier
  iommu: Remove iommu group changes notifier

 include/linux/device/driver.h         |   2 +
 include/linux/iommu.h                 |  91 ++++++--
 drivers/base/dd.c                     |  37 ++-
 drivers/gpu/drm/tegra/dc.c            |   1 +
 drivers/gpu/drm/tegra/drm.c           |  54 ++---
 drivers/gpu/drm/tegra/gr2d.c          |   1 +
 drivers/gpu/drm/tegra/gr3d.c          |   1 +
 drivers/gpu/drm/tegra/vic.c           |   3 +-
 drivers/iommu/iommu.c                 | 321 +++++++++++++++++++-------
 drivers/pci/pci-stub.c                |   3 +
 drivers/pci/pcie/portdrv_pci.c        |   5 +-
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |   1 +
 drivers/vfio/pci/vfio_pci.c           |   3 +
 drivers/vfio/platform/vfio_amba.c     |   1 +
 drivers/vfio/platform/vfio_platform.c |   1 +
 drivers/vfio/vfio.c                   | 248 ++------------------
 16 files changed, 406 insertions(+), 367 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier()
@ 2021-12-17  6:36 ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

Hi folks,

The iommu group is the minimal isolation boundary for DMA. Devices in
a group can access each other's MMIO registers via peer to peer DMA
and also need share the same I/O address space.

Once the I/O address space is assigned to user control it is no longer
available to the dma_map* API, which effectively makes the DMA API
non-working.

Second, userspace can use DMA initiated by a device that it controls
to access the MMIO spaces of other devices in the group. This allows
userspace to indirectly attack any kernel owned device and it's driver.

Therefore groups must either be entirely under kernel control or
userspace control, never a mixture. Unfortunately some systems have
problems with the granularity of groups and there are a couple of
important exceptions:

 - pci_stub allows the admin to block driver binding on a device and
   make it permanently shared with userspace. Since PCI stub does not
   do DMA it is safe, however the admin must understand that using
   pci_stub allows userspace to attack whatever device it was bound
   it.

 - PCI bridges are sometimes included in groups. Typically PCI bridges
   do not use DMA, and generally do not have MMIO regions.

Generally any device that does not have any MMIO registers is a
possible candidate for an exception.

Currently vfio adopts a workaround to detect violations of the above
restrictions by monitoring the driver core BOUND event, and hardwiring
the above exceptions. Since there is no way for vfio to reject driver
binding at this point, BUG_ON() is triggered if a violation is
captured (kernel driver BOUND event on a group which already has some
devices assigned to userspace). Aside from the bad user experience
this opens a way for root userspace to crash the kernel, even in high
integrity configurations, by manipulating the module binding and
triggering the BUG_ON.

This series solves this problem by making the user/kernel ownership a
core concept at the IOMMU layer. The driver core enforces kernel
ownership while drivers are bound and violations now result in a error
codes during probe, not BUG_ON failures.

Patch partitions:
  [PATCH 1-4]: Detect DMA ownership conflicts during driver binding;
  [PATCH 5-8]: Add security context management for assigned devices;
  [PATCH 9-13]: Various cleanups.

This is also part one of three initial series for IOMMUFD:
 * Move IOMMU Group security into the iommu layer
 - Generic IOMMUFD implementation
 - VFIO ability to consume IOMMUFD

Change log:
v1: initial post
  - https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/

v2:
  - https://lore.kernel.org/linux-iommu/20211128025051.355578-1-baolu.lu@linux.intel.com/

  - Move kernel dma ownership auto-claiming from driver core to bus
    callback. [Greg/Christoph/Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#m153706912b770682cb12e3c28f57e171aa1f9d0c

  - Code and interface refactoring for iommu_set/release_dma_owner()
    interfaces. [Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Add new iommu_attach/detach_device_shared() interfaces for
    multiple devices group. [Robin/Jason]
    https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e

  - [NEW]Use iommu_attach/detach_device_shared() in drm/tegra drivers.

  - Refactoring and description refinement.

v3:
  - https://lore.kernel.org/linux-iommu/20211206015903.88687-1-baolu.lu@linux.intel.com/

  - Rename bus_type::dma_unconfigure to bus_type::dma_cleanup. [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m6711e041e47cb0cbe3964fad0a3466f5ae4b3b9b

  - Avoid _platform_dma_configure for platform_bus_type::dma_configure.
    [Greg]
    https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m43fc46286611aa56a5c0eeaad99d539e5519f3f6

  - Patch "0012-iommu-Add-iommu_at-de-tach_device_shared-for-mult.patch"
    and "0018-drm-tegra-Use-the-iommu-dma_owner-mechanism.patch" have
    been tested by Dmitry Osipenko <digetx@gmail.com>.

v4:
  - Remove unnecessary tegra->domain chech in the tegra patch. (Jason)
  - Remove DMA_OWNER_NONE. (Joerg)
  - Change refcount to unsigned int. (Christoph)
  - Move mutex lock into group set_dma_owner functions. (Christoph)
  - Add kernel doc for iommu_attach/detach_domain_shared(). (Christoph)
  - Move dma auto-claim into driver core. (Jason/Christoph)

This is based on next branch of linux-iommu tree:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
and also available on github:
https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v4

Merry Christmas to you all!

Best regards,
baolu

Jason Gunthorpe (2):
  vfio: Delete the unbound_list
  drm/tegra: Use the iommu dma_owner mechanism

Lu Baolu (11):
  iommu: Add device dma ownership set/release interfaces
  driver core: Set DMA ownership during driver bind/unbind
  PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  iommu: Add security context management for assigned devices
  iommu: Expose group variants of dma ownership interfaces
  iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  vfio: Set DMA USER ownership for VFIO devices
  vfio: Remove use of vfio_group_viable()
  vfio: Remove iommu group notifier
  iommu: Remove iommu group changes notifier

 include/linux/device/driver.h         |   2 +
 include/linux/iommu.h                 |  91 ++++++--
 drivers/base/dd.c                     |  37 ++-
 drivers/gpu/drm/tegra/dc.c            |   1 +
 drivers/gpu/drm/tegra/drm.c           |  54 ++---
 drivers/gpu/drm/tegra/gr2d.c          |   1 +
 drivers/gpu/drm/tegra/gr3d.c          |   1 +
 drivers/gpu/drm/tegra/vic.c           |   3 +-
 drivers/iommu/iommu.c                 | 321 +++++++++++++++++++-------
 drivers/pci/pci-stub.c                |   3 +
 drivers/pci/pcie/portdrv_pci.c        |   5 +-
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |   1 +
 drivers/vfio/pci/vfio_pci.c           |   3 +
 drivers/vfio/platform/vfio_amba.c     |   1 +
 drivers/vfio/platform/vfio_platform.c |   1 +
 drivers/vfio/vfio.c                   | 248 ++------------------
 16 files changed, 406 insertions(+), 367 deletions(-)

-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 01/13] iommu: Add device dma ownership set/release interfaces
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:36   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

From the perspective of who is initiating the device to do DMA, device
DMA could be divided into the following types:

        DMA_OWNER_DMA_API: Device DMAs are initiated by a kernel driver
			through the kernel DMA API.
        DMA_OWNER_PRIVATE_DOMAIN: Device DMAs are initiated by a kernel
			driver with its own PRIVATE domain.
	DMA_OWNER_PRIVATE_DOMAIN_USER: Device DMAs are initiated by
			userspace.

Different DMA ownerships are exclusive for all devices in the same iommu
group as an iommu group is the smallest granularity of device isolation
and protection that the IOMMU subsystem can guarantee. This extends the
iommu core to enforce this exclusion.

Basically two new interfaces are provided:

        int iommu_device_set_dma_owner(struct device *dev,
                enum iommu_dma_owner type, void *owner_cookie);
        void iommu_device_release_dma_owner(struct device *dev,
                enum iommu_dma_owner type);

Although above interfaces are per-device, DMA owner is tracked per group
under the hood. An iommu group cannot have different dma ownership set
at the same time. Violation of this assumption fails
iommu_device_set_dma_owner().

Kernel driver which does DMA have DMA_OWNER_DMA_API automatically set/
released in the driver binding/unbinding process (see next patch).

Kernel driver which doesn't do DMA could avoid setting the owner type.
Device bound to such driver is considered same as a driver-less device
which is compatible to all owner types.

Userspace driver framework (e.g. vfio) should set
DMA_OWNER_PRIVATE_DOMAIN_USER for a device before the userspace is allowed
to access it, plus a owner cookie pointer to mark the user identity so a
single group cannot be operated by multiple users simultaneously. Vice
versa, the owner type should be released after the user access permission
is withdrawn.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 34 ++++++++++++++++
 drivers/iommu/iommu.c | 95 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 129 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..53a023ee1ac0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -162,6 +162,21 @@ enum iommu_dev_features {
 	IOMMU_DEV_FEAT_IOPF,
 };
 
+/**
+ * enum iommu_dma_owner - IOMMU DMA ownership
+ * @DMA_OWNER_DMA_API: Device DMAs are initiated by a kernel driver through
+ *			the kernel DMA API.
+ * @DMA_OWNER_PRIVATE_DOMAIN: Device DMAs are initiated by a kernel driver
+ *			which provides an UNMANAGED domain.
+ * @DMA_OWNER_PRIVATE_DOMAIN_USER: Device DMAs are initiated by userspace,
+ *			kernel ensures that DMAs never go to kernel memory.
+ */
+enum iommu_dma_owner {
+	DMA_OWNER_DMA_API,
+	DMA_OWNER_PRIVATE_DOMAIN,
+	DMA_OWNER_PRIVATE_DOMAIN_USER,
+};
+
 #define IOMMU_PASID_INVALID	(-1U)
 
 #ifdef CONFIG_IOMMU_API
@@ -681,6 +696,10 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       void *owner_cookie);
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1081,6 +1100,21 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline int iommu_device_set_dma_owner(struct device *dev,
+					     enum iommu_dma_owner owner,
+					     void *owner_cookie)
+{
+	if (owner != DMA_OWNER_DMA_API)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void iommu_device_release_dma_owner(struct device *dev,
+						  enum iommu_dma_owner owner)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b86406b7162..5439bf45afb2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,9 @@ struct iommu_group {
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	enum iommu_dma_owner dma_owner;
+	unsigned int owner_cnt;
+	void *owner_cookie;
 };
 
 struct group_device {
@@ -3351,3 +3354,95 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	return ret;
 }
+
+static int iommu_group_set_dma_owner(struct iommu_group *group,
+				     enum iommu_dma_owner owner,
+				     void *owner_cookie)
+{
+	int ret = 0;
+
+	mutex_lock(&group->mutex);
+	if (group->owner_cnt &&
+	    (group->dma_owner != owner ||
+	     group->owner_cookie != owner_cookie)) {
+		ret = -EBUSY;
+		goto unlock_out;
+	}
+
+	group->dma_owner = owner;
+	group->owner_cookie = owner_cookie;
+	group->owner_cnt++;
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static void iommu_group_release_dma_owner(struct iommu_group *group,
+					  enum iommu_dma_owner owner)
+{
+	mutex_lock(&group->mutex);
+	if (WARN_ON(!group->owner_cnt || group->dma_owner != owner))
+		goto unlock_out;
+
+	if (--group->owner_cnt > 0)
+		goto unlock_out;
+
+	group->dma_owner = DMA_OWNER_DMA_API;
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+}
+
+/**
+ * iommu_device_set_dma_owner() - Set DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA ownership type.
+ * @owner_cookie: Caller specified pointer. Could be used for exclusive
+ *                declaration. Could be NULL.
+ *
+ * Set the DMA ownership of a device. The different ownerships are
+ * exclusive. The caller could specify a owner_cookie pointer so that
+ * the same DMA ownership could be exclusive among different owners.
+ */
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       void *owner_cookie)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+	int ret;
+
+	if (!group) {
+		if (owner == DMA_OWNER_DMA_API)
+			return 0;
+		else
+			return -ENODEV;
+	}
+
+	ret = iommu_group_set_dma_owner(group, owner, owner_cookie);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_device_set_dma_owner);
+
+/**
+ * iommu_device_release_dma_owner() - Release DMA ownership of a device
+ * @dev: The device.
+ * @owner: The DMA ownership type.
+ *
+ * Release the DMA ownership claimed by iommu_device_set_dma_owner().
+ */
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+
+	if (!group) {
+		WARN_ON(owner != DMA_OWNER_DMA_API);
+		return;
+	}
+
+	iommu_group_release_dma_owner(group, owner);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 01/13] iommu: Add device dma ownership set/release interfaces
@ 2021-12-17  6:36   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

From the perspective of who is initiating the device to do DMA, device
DMA could be divided into the following types:

        DMA_OWNER_DMA_API: Device DMAs are initiated by a kernel driver
			through the kernel DMA API.
        DMA_OWNER_PRIVATE_DOMAIN: Device DMAs are initiated by a kernel
			driver with its own PRIVATE domain.
	DMA_OWNER_PRIVATE_DOMAIN_USER: Device DMAs are initiated by
			userspace.

Different DMA ownerships are exclusive for all devices in the same iommu
group as an iommu group is the smallest granularity of device isolation
and protection that the IOMMU subsystem can guarantee. This extends the
iommu core to enforce this exclusion.

Basically two new interfaces are provided:

        int iommu_device_set_dma_owner(struct device *dev,
                enum iommu_dma_owner type, void *owner_cookie);
        void iommu_device_release_dma_owner(struct device *dev,
                enum iommu_dma_owner type);

Although above interfaces are per-device, DMA owner is tracked per group
under the hood. An iommu group cannot have different dma ownership set
at the same time. Violation of this assumption fails
iommu_device_set_dma_owner().

Kernel driver which does DMA have DMA_OWNER_DMA_API automatically set/
released in the driver binding/unbinding process (see next patch).

Kernel driver which doesn't do DMA could avoid setting the owner type.
Device bound to such driver is considered same as a driver-less device
which is compatible to all owner types.

Userspace driver framework (e.g. vfio) should set
DMA_OWNER_PRIVATE_DOMAIN_USER for a device before the userspace is allowed
to access it, plus a owner cookie pointer to mark the user identity so a
single group cannot be operated by multiple users simultaneously. Vice
versa, the owner type should be released after the user access permission
is withdrawn.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 34 ++++++++++++++++
 drivers/iommu/iommu.c | 95 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 129 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..53a023ee1ac0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -162,6 +162,21 @@ enum iommu_dev_features {
 	IOMMU_DEV_FEAT_IOPF,
 };
 
+/**
+ * enum iommu_dma_owner - IOMMU DMA ownership
+ * @DMA_OWNER_DMA_API: Device DMAs are initiated by a kernel driver through
+ *			the kernel DMA API.
+ * @DMA_OWNER_PRIVATE_DOMAIN: Device DMAs are initiated by a kernel driver
+ *			which provides an UNMANAGED domain.
+ * @DMA_OWNER_PRIVATE_DOMAIN_USER: Device DMAs are initiated by userspace,
+ *			kernel ensures that DMAs never go to kernel memory.
+ */
+enum iommu_dma_owner {
+	DMA_OWNER_DMA_API,
+	DMA_OWNER_PRIVATE_DOMAIN,
+	DMA_OWNER_PRIVATE_DOMAIN_USER,
+};
+
 #define IOMMU_PASID_INVALID	(-1U)
 
 #ifdef CONFIG_IOMMU_API
@@ -681,6 +696,10 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       void *owner_cookie);
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1081,6 +1100,21 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline int iommu_device_set_dma_owner(struct device *dev,
+					     enum iommu_dma_owner owner,
+					     void *owner_cookie)
+{
+	if (owner != DMA_OWNER_DMA_API)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void iommu_device_release_dma_owner(struct device *dev,
+						  enum iommu_dma_owner owner)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b86406b7162..5439bf45afb2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,9 @@ struct iommu_group {
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	enum iommu_dma_owner dma_owner;
+	unsigned int owner_cnt;
+	void *owner_cookie;
 };
 
 struct group_device {
@@ -3351,3 +3354,95 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	return ret;
 }
+
+static int iommu_group_set_dma_owner(struct iommu_group *group,
+				     enum iommu_dma_owner owner,
+				     void *owner_cookie)
+{
+	int ret = 0;
+
+	mutex_lock(&group->mutex);
+	if (group->owner_cnt &&
+	    (group->dma_owner != owner ||
+	     group->owner_cookie != owner_cookie)) {
+		ret = -EBUSY;
+		goto unlock_out;
+	}
+
+	group->dma_owner = owner;
+	group->owner_cookie = owner_cookie;
+	group->owner_cnt++;
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+
+static void iommu_group_release_dma_owner(struct iommu_group *group,
+					  enum iommu_dma_owner owner)
+{
+	mutex_lock(&group->mutex);
+	if (WARN_ON(!group->owner_cnt || group->dma_owner != owner))
+		goto unlock_out;
+
+	if (--group->owner_cnt > 0)
+		goto unlock_out;
+
+	group->dma_owner = DMA_OWNER_DMA_API;
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+}
+
+/**
+ * iommu_device_set_dma_owner() - Set DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA ownership type.
+ * @owner_cookie: Caller specified pointer. Could be used for exclusive
+ *                declaration. Could be NULL.
+ *
+ * Set the DMA ownership of a device. The different ownerships are
+ * exclusive. The caller could specify a owner_cookie pointer so that
+ * the same DMA ownership could be exclusive among different owners.
+ */
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       void *owner_cookie)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+	int ret;
+
+	if (!group) {
+		if (owner == DMA_OWNER_DMA_API)
+			return 0;
+		else
+			return -ENODEV;
+	}
+
+	ret = iommu_group_set_dma_owner(group, owner, owner_cookie);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_device_set_dma_owner);
+
+/**
+ * iommu_device_release_dma_owner() - Release DMA ownership of a device
+ * @dev: The device.
+ * @owner: The DMA ownership type.
+ *
+ * Release the DMA ownership claimed by iommu_device_set_dma_owner().
+ */
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+
+	if (!group) {
+		WARN_ON(owner != DMA_OWNER_DMA_API);
+		return;
+	}
+
+	iommu_group_release_dma_owner(group, owner);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:36   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

This extends really_probe() to allow checking for dma ownership conflict
during the driver binding process. By default, the DMA_OWNER_DMA_API is
claimed for the bound driver before calling its .probe() callback. If this
operation fails (e.g. the iommu group of the target device already has the
DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
security contract for devices in the iommu group.

Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
event and then BUG_ON() in case of dma ownership conflict. This leads to
bad user experience since careless driver binding operation may crash the
system if the admin overlooks the group restriction. Aside from bad design,
this leads to a security problem as a root user can force the kernel to
BUG() even with lockdown=integrity.

Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
claim in the binding process. Examples include kernel drivers (pci_stub,
PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
etc.) which need to manually claim DMA_OWNER_USER when assigning a device
to userspace.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/device/driver.h |  2 ++
 drivers/base/dd.c             | 37 ++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..f5bf7030c416 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -54,6 +54,7 @@ enum probe_type {
  * @owner:	The module owner.
  * @mod_name:	Used for built-in modules.
  * @suppress_bind_attrs: Disables bind/unbind via sysfs.
+ * @suppress_auto_claim_dma_owner: Disable kernel dma auto-claim.
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
@@ -100,6 +101,7 @@ struct device_driver {
 	const char		*mod_name;	/* used for built-in modules */
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool suppress_auto_claim_dma_owner;
 	enum probe_type probe_type;
 
 	const struct of_device_id	*of_match_table;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f949daa..b04eec5dcefa 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -28,6 +28,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/pinctrl/devinfo.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -538,6 +539,32 @@ static int call_driver_probe(struct device *dev, struct device_driver *drv)
 	return ret;
 }
 
+static int device_dma_configure(struct device *dev, struct device_driver *drv)
+{
+	int ret;
+
+	if (!dev->bus->dma_configure)
+		return 0;
+
+	ret = dev->bus->dma_configure(dev);
+	if (ret)
+		return ret;
+
+	if (!drv->suppress_auto_claim_dma_owner)
+		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);
+
+	return ret;
+}
+
+static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
+{
+	if (!dev->bus->dma_configure)
+		return;
+
+	if (!drv->suppress_auto_claim_dma_owner)
+		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
+}
+
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
@@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (ret)
 		goto pinctrl_bind_failed;
 
-	if (dev->bus->dma_configure) {
-		ret = dev->bus->dma_configure(dev);
-		if (ret)
-			goto probe_failed;
-	}
+	if (device_dma_configure(dev, drv))
+		goto pinctrl_bind_failed;
 
 	ret = driver_sysfs_add(dev);
 	if (ret) {
@@ -660,6 +684,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
+
+	device_dma_cleanup(dev, drv);
 pinctrl_bind_failed:
 	device_links_no_driver(dev);
 	devres_release_all(dev);
@@ -1204,6 +1230,7 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 		else if (drv->remove)
 			drv->remove(dev);
 
+		device_dma_cleanup(dev, drv);
 		device_links_driver_cleanup(dev);
 
 		devres_release_all(dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-17  6:36   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

This extends really_probe() to allow checking for dma ownership conflict
during the driver binding process. By default, the DMA_OWNER_DMA_API is
claimed for the bound driver before calling its .probe() callback. If this
operation fails (e.g. the iommu group of the target device already has the
DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
security contract for devices in the iommu group.

Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
event and then BUG_ON() in case of dma ownership conflict. This leads to
bad user experience since careless driver binding operation may crash the
system if the admin overlooks the group restriction. Aside from bad design,
this leads to a security problem as a root user can force the kernel to
BUG() even with lockdown=integrity.

Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
claim in the binding process. Examples include kernel drivers (pci_stub,
PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
etc.) which need to manually claim DMA_OWNER_USER when assigning a device
to userspace.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/device/driver.h |  2 ++
 drivers/base/dd.c             | 37 ++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..f5bf7030c416 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -54,6 +54,7 @@ enum probe_type {
  * @owner:	The module owner.
  * @mod_name:	Used for built-in modules.
  * @suppress_bind_attrs: Disables bind/unbind via sysfs.
+ * @suppress_auto_claim_dma_owner: Disable kernel dma auto-claim.
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
@@ -100,6 +101,7 @@ struct device_driver {
 	const char		*mod_name;	/* used for built-in modules */
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool suppress_auto_claim_dma_owner;
 	enum probe_type probe_type;
 
 	const struct of_device_id	*of_match_table;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f949daa..b04eec5dcefa 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -28,6 +28,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/pinctrl/devinfo.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -538,6 +539,32 @@ static int call_driver_probe(struct device *dev, struct device_driver *drv)
 	return ret;
 }
 
+static int device_dma_configure(struct device *dev, struct device_driver *drv)
+{
+	int ret;
+
+	if (!dev->bus->dma_configure)
+		return 0;
+
+	ret = dev->bus->dma_configure(dev);
+	if (ret)
+		return ret;
+
+	if (!drv->suppress_auto_claim_dma_owner)
+		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);
+
+	return ret;
+}
+
+static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
+{
+	if (!dev->bus->dma_configure)
+		return;
+
+	if (!drv->suppress_auto_claim_dma_owner)
+		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
+}
+
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
@@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (ret)
 		goto pinctrl_bind_failed;
 
-	if (dev->bus->dma_configure) {
-		ret = dev->bus->dma_configure(dev);
-		if (ret)
-			goto probe_failed;
-	}
+	if (device_dma_configure(dev, drv))
+		goto pinctrl_bind_failed;
 
 	ret = driver_sysfs_add(dev);
 	if (ret) {
@@ -660,6 +684,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
+
+	device_dma_cleanup(dev, drv);
 pinctrl_bind_failed:
 	device_links_no_driver(dev);
 	devres_release_all(dev);
@@ -1204,6 +1230,7 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 		else if (drv->remove)
 			drv->remove(dev);
 
+		device_dma_cleanup(dev, drv);
 		device_links_driver_cleanup(dev);
 
 		devres_release_all(dev);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:36   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

The pci_dma_configure() marks the iommu_group as containing only devices
with kernel drivers that manage DMA. Avoid this default behavior for the
pci_stub because it does not program any DMA itself.  This allows the
pci_stub still able to be used by the admin to block driver binding after
applying the DMA ownership to vfio.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pci-stub.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..6324c68602b4 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
 	.name		= "pci-stub",
 	.id_table	= NULL,	/* only dynamic id's */
 	.probe		= pci_stub_probe,
+	.driver		= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init pci_stub_init(void)
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-17  6:36   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The pci_dma_configure() marks the iommu_group as containing only devices
with kernel drivers that manage DMA. Avoid this default behavior for the
pci_stub because it does not program any DMA itself.  This allows the
pci_stub still able to be used by the admin to block driver binding after
applying the DMA ownership to vfio.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pci-stub.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..6324c68602b4 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
 	.name		= "pci-stub",
 	.id_table	= NULL,	/* only dynamic id's */
 	.probe		= pci_stub_probe,
+	.driver		= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init pci_stub_init(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:36   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
then all of the downstream devices will be part of the same IOMMU group
as the bridge. The existing vfio framework allows the portdrv driver to
be bound to the bridge while its downstream devices are assigned to user
space. The pci_dma_configure() marks the iommu_group as containing only
devices with kernel drivers that manage DMA. Avoid this default behavior
for the portdrv driver in order for compatibility with the current vfio
policy.

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
policy to all kernel drivers of bridge class. This is not always safe.
For example, The shpchp_core driver relies on the PCI MMIO access for the
controller functionality. With its downstream devices assigned to the
userspace, the MMIO might be changed through user initiated P2P accesses
without any notification. This might break the kernel driver integrity
and lead to some unpredictable consequences.

For any bridge driver, in order to avoiding default kernel DMA ownership
claiming, we should consider:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
    a dma_map_* API is a sure indicate the driver is doing DMA

 2) If the bridge driver uses MMIO, is it tolerant to hostile
    userspace also touching the same MMIO registers via P2P DMA
    attacks?

Conservatively if the driver maps an MMIO region at all, we can say that
it fails the test.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..c48a8734f9c4 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+	.driver		= {
+		.pm = PCIE_PORTDRV_PM_OPS,
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
@ 2021-12-17  6:36   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
then all of the downstream devices will be part of the same IOMMU group
as the bridge. The existing vfio framework allows the portdrv driver to
be bound to the bridge while its downstream devices are assigned to user
space. The pci_dma_configure() marks the iommu_group as containing only
devices with kernel drivers that manage DMA. Avoid this default behavior
for the portdrv driver in order for compatibility with the current vfio
policy.

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
policy to all kernel drivers of bridge class. This is not always safe.
For example, The shpchp_core driver relies on the PCI MMIO access for the
controller functionality. With its downstream devices assigned to the
userspace, the MMIO might be changed through user initiated P2P accesses
without any notification. This might break the kernel driver integrity
and lead to some unpredictable consequences.

For any bridge driver, in order to avoiding default kernel DMA ownership
claiming, we should consider:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
    a dma_map_* API is a sure indicate the driver is doing DMA

 2) If the bridge driver uses MMIO, is it tolerant to hostile
    userspace also touching the same MMIO registers via P2P DMA
    attacks?

Conservatively if the driver maps an MMIO region at all, we can say that
it fails the test.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..c48a8734f9c4 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+	.driver		= {
+		.pm = PCIE_PORTDRV_PM_OPS,
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 05/13] iommu: Add security context management for assigned devices
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

When an iommu group has DMA_OWNER_PRIVATE_DOMAIN_USER set for the first
time, it is a contract that the group could be assigned to userspace from
now on. The group must be detached from the default iommu domain and all
devices in this group are blocked from doing DMA until it is attached to a
user controlled iommu_domain. Correspondingly, the default domain should
be reattached after the last DMA_OWNER_USER is released.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5439bf45afb2..573e253bad51 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -292,7 +292,12 @@ int iommu_probe_device(struct device *dev)
 	mutex_lock(&group->mutex);
 	iommu_alloc_default_domain(group, dev);
 
-	if (group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid attaching the default domain.
+	 */
+	if (group->default_domain &&
+	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
 		ret = __iommu_attach_device(group->default_domain, dev);
 		if (ret) {
 			mutex_unlock(&group->mutex);
@@ -2323,7 +2328,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (group->default_domain && group->domain != group->default_domain)
+	if (group->domain && group->domain != group->default_domain)
 		return -EBUSY;
 
 	ret = __iommu_group_for_each_dev(group, domain,
@@ -2360,7 +2365,12 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (!group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid re-attaching the default domain.
+	 */
+	if (!group->default_domain ||
+	    group->dma_owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
 		__iommu_group_for_each_dev(group, domain,
 					   iommu_group_do_detach_device);
 		group->domain = NULL;
@@ -3373,6 +3383,16 @@ static int iommu_group_set_dma_owner(struct iommu_group *group,
 	group->owner_cookie = owner_cookie;
 	group->owner_cnt++;
 
+	/*
+	 * We must ensure that any device DMAs issued after this call
+	 * are discarded. DMAs can only reach real memory once someone
+	 * has attached a real domain.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER &&
+	    group->domain &&
+	    !WARN_ON(group->domain != group->default_domain))
+		__iommu_detach_group(group->domain, group);
+
 unlock_out:
 	mutex_unlock(&group->mutex);
 
@@ -3391,6 +3411,15 @@ static void iommu_group_release_dma_owner(struct iommu_group *group,
 
 	group->dma_owner = DMA_OWNER_DMA_API;
 
+	/*
+	 * The UNMANAGED domain should be detached before all USER
+	 * owners have been released.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (!WARN_ON(group->domain) && group->default_domain)
+			__iommu_attach_group(group->default_domain, group);
+	}
+
 unlock_out:
 	mutex_unlock(&group->mutex);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 05/13] iommu: Add security context management for assigned devices
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

When an iommu group has DMA_OWNER_PRIVATE_DOMAIN_USER set for the first
time, it is a contract that the group could be assigned to userspace from
now on. The group must be detached from the default iommu domain and all
devices in this group are blocked from doing DMA until it is attached to a
user controlled iommu_domain. Correspondingly, the default domain should
be reattached after the last DMA_OWNER_USER is released.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5439bf45afb2..573e253bad51 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -292,7 +292,12 @@ int iommu_probe_device(struct device *dev)
 	mutex_lock(&group->mutex);
 	iommu_alloc_default_domain(group, dev);
 
-	if (group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid attaching the default domain.
+	 */
+	if (group->default_domain &&
+	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
 		ret = __iommu_attach_device(group->default_domain, dev);
 		if (ret) {
 			mutex_unlock(&group->mutex);
@@ -2323,7 +2328,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (group->default_domain && group->domain != group->default_domain)
+	if (group->domain && group->domain != group->default_domain)
 		return -EBUSY;
 
 	ret = __iommu_group_for_each_dev(group, domain,
@@ -2360,7 +2365,12 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (!group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid re-attaching the default domain.
+	 */
+	if (!group->default_domain ||
+	    group->dma_owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
 		__iommu_group_for_each_dev(group, domain,
 					   iommu_group_do_detach_device);
 		group->domain = NULL;
@@ -3373,6 +3383,16 @@ static int iommu_group_set_dma_owner(struct iommu_group *group,
 	group->owner_cookie = owner_cookie;
 	group->owner_cnt++;
 
+	/*
+	 * We must ensure that any device DMAs issued after this call
+	 * are discarded. DMAs can only reach real memory once someone
+	 * has attached a real domain.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER &&
+	    group->domain &&
+	    !WARN_ON(group->domain != group->default_domain))
+		__iommu_detach_group(group->domain, group);
+
 unlock_out:
 	mutex_unlock(&group->mutex);
 
@@ -3391,6 +3411,15 @@ static void iommu_group_release_dma_owner(struct iommu_group *group,
 
 	group->dma_owner = DMA_OWNER_DMA_API;
 
+	/*
+	 * The UNMANAGED domain should be detached before all USER
+	 * owners have been released.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (!WARN_ON(group->domain) && group->default_domain)
+			__iommu_attach_group(group->default_domain, group);
+	}
+
 unlock_out:
 	mutex_unlock(&group->mutex);
 }
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 06/13] iommu: Expose group variants of dma ownership interfaces
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The vfio needs to set DMA_OWNER_PRIVATE_DOMAIN_USER for the entire group
when attaching it to a vfio container. Expose group variants of setting/
releasing dma ownership for this purpose.

This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
to report to userspace if the group is viable to user assignment for
compatibility with VFIO_GROUP_FLAGS_VIABLE.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 21 +++++++++++++++++++
 drivers/iommu/iommu.c | 47 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 53a023ee1ac0..5ad4cf13370d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -699,6 +699,10 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
 			       void *owner_cookie);
 void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      void *owner_cookie);
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -1115,6 +1119,23 @@ static inline void iommu_device_release_dma_owner(struct device *dev,
 						  enum iommu_dma_owner owner)
 {
 }
+
+static inline int iommu_group_set_dma_owner(struct iommu_group *group,
+					    enum iommu_dma_owner owner,
+					    void *owner_cookie)
+{
+	return -EINVAL;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group,
+						 enum iommu_dma_owner owner)
+{
+}
+
+static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	return false;
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 573e253bad51..8bec71b1cc18 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3365,9 +3365,19 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 	return ret;
 }
 
-static int iommu_group_set_dma_owner(struct iommu_group *group,
-				     enum iommu_dma_owner owner,
-				     void *owner_cookie)
+/**
+ * iommu_group_set_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA owner type.
+ * @owner_cookie: Caller specified pointer. Could be used for exclusive
+ *                declaration. Could be NULL.
+ *
+ * This is to support backward compatibility for legacy vfio which manages
+ * dma ownership in group level. New invocations on this interface should be
+ * prohibited. Instead, please turn to iommu_device_set_dma_owner().
+ */
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      void *owner_cookie)
 {
 	int ret = 0;
 
@@ -3398,9 +3408,16 @@ static int iommu_group_set_dma_owner(struct iommu_group *group,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_group_set_dma_owner);
 
-static void iommu_group_release_dma_owner(struct iommu_group *group,
-					  enum iommu_dma_owner owner)
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA owner type.
+ *
+ * Release the DMA ownership claimed by iommu_group_set_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner)
 {
 	mutex_lock(&group->mutex);
 	if (WARN_ON(!group->owner_cnt || group->dma_owner != owner))
@@ -3423,6 +3440,26 @@ static void iommu_group_release_dma_owner(struct iommu_group *group,
 unlock_out:
 	mutex_unlock(&group->mutex);
 }
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_unclaimed() - Is group dma ownership claimed
+ * @group: The group.
+ *
+ * This provides status check on a given group. It is racey and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	unsigned int user;
+
+	mutex_lock(&group->mutex);
+	user = group->owner_cnt;
+	mutex_unlock(&group->mutex);
+
+	return !user;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);
 
 /**
  * iommu_device_set_dma_owner() - Set DMA ownership of a device
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 06/13] iommu: Expose group variants of dma ownership interfaces
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

The vfio needs to set DMA_OWNER_PRIVATE_DOMAIN_USER for the entire group
when attaching it to a vfio container. Expose group variants of setting/
releasing dma ownership for this purpose.

This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
to report to userspace if the group is viable to user assignment for
compatibility with VFIO_GROUP_FLAGS_VIABLE.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 21 +++++++++++++++++++
 drivers/iommu/iommu.c | 47 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 53a023ee1ac0..5ad4cf13370d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -699,6 +699,10 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
 			       void *owner_cookie);
 void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      void *owner_cookie);
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -1115,6 +1119,23 @@ static inline void iommu_device_release_dma_owner(struct device *dev,
 						  enum iommu_dma_owner owner)
 {
 }
+
+static inline int iommu_group_set_dma_owner(struct iommu_group *group,
+					    enum iommu_dma_owner owner,
+					    void *owner_cookie)
+{
+	return -EINVAL;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group,
+						 enum iommu_dma_owner owner)
+{
+}
+
+static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	return false;
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 573e253bad51..8bec71b1cc18 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3365,9 +3365,19 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 	return ret;
 }
 
-static int iommu_group_set_dma_owner(struct iommu_group *group,
-				     enum iommu_dma_owner owner,
-				     void *owner_cookie)
+/**
+ * iommu_group_set_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA owner type.
+ * @owner_cookie: Caller specified pointer. Could be used for exclusive
+ *                declaration. Could be NULL.
+ *
+ * This is to support backward compatibility for legacy vfio which manages
+ * dma ownership in group level. New invocations on this interface should be
+ * prohibited. Instead, please turn to iommu_device_set_dma_owner().
+ */
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      void *owner_cookie)
 {
 	int ret = 0;
 
@@ -3398,9 +3408,16 @@ static int iommu_group_set_dma_owner(struct iommu_group *group,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_group_set_dma_owner);
 
-static void iommu_group_release_dma_owner(struct iommu_group *group,
-					  enum iommu_dma_owner owner)
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA owner type.
+ *
+ * Release the DMA ownership claimed by iommu_group_set_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner)
 {
 	mutex_lock(&group->mutex);
 	if (WARN_ON(!group->owner_cnt || group->dma_owner != owner))
@@ -3423,6 +3440,26 @@ static void iommu_group_release_dma_owner(struct iommu_group *group,
 unlock_out:
 	mutex_unlock(&group->mutex);
 }
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_unclaimed() - Is group dma ownership claimed
+ * @group: The group.
+ *
+ * This provides status check on a given group. It is racey and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	unsigned int user;
+
+	mutex_lock(&group->mutex);
+	user = group->owner_cnt;
+	mutex_unlock(&group->mutex);
+
+	return !user;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);
 
 /**
  * iommu_device_set_dma_owner() - Set DMA ownership of a device
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The iommu_attach/detach_device() interfaces were exposed for the device
drivers to attach/detach their own domains. The commit <426a273834eae>
("iommu: Limit iommu_attach/detach_device to device with their own group")
restricted them to singleton groups to avoid different device in a group
attaching different domain.

As we've introduced device DMA ownership into the iommu core. We can now
introduce interfaces for muliple-device groups, and "all devices are in the
same address space" is still guaranteed.

The iommu_attach/detach_device_shared() could be used when multiple drivers
sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first
call of iommu_attach_device_shared() attaches the domain to the group.
Other drivers could join it later. The domain will be detached from the
group after all drivers unjoin it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
---
 include/linux/iommu.h | 13 +++++++
 drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5ad4cf13370d..1bc03118dfb3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner ow
 			      void *owner_cookie);
 void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
 bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev);
+void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+static inline int iommu_attach_device_shared(struct iommu_domain *domain,
+					     struct device *dev)
+{
+	return -ENODEV;
+}
+
 static inline void iommu_detach_device(struct iommu_domain *domain,
 				       struct device *dev)
 {
 }
 
+static inline void iommu_detach_device_shared(struct iommu_domain *domain,
+					      struct device *dev)
+{
+}
+
 static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 {
 	return NULL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8bec71b1cc18..3ad66cb9bedc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -50,6 +50,7 @@ struct iommu_group {
 	struct list_head entry;
 	enum iommu_dma_owner dma_owner;
 	unsigned int owner_cnt;
+	unsigned int attach_cnt;
 	void *owner_cookie;
 };
 
@@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own
 	iommu_group_put(group);
 }
 EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
+
+/**
+ * iommu_attach_device_shared() - Attach shared domain to a device
+ * @domain: The shared domain.
+ * @dev: The device.
+ *
+ * Similar to iommu_attach_device(), but allowed for shared-group devices
+ * and guarantees that all devices in an iommu group could only be attached
+ * by a same iommu domain. The caller should explicitly set the dma ownership
+ * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before
+ * calling it and use the paired helper iommu_detach_device_shared() for
+ * cleanup.
+ */
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_group *group;
+	int ret = 0;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
+	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		ret = -EPERM;
+		goto unlock_out;
+	}
+
+	if (group->attach_cnt) {
+		if (group->domain != domain) {
+			ret = -EBUSY;
+			goto unlock_out;
+		}
+	} else {
+		ret = __iommu_attach_group(domain, group);
+		if (ret)
+			goto unlock_out;
+	}
+
+	group->attach_cnt++;
+unlock_out:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_attach_device_shared);
+
+/**
+ * iommu_detach_device_shared() - Detach a domain from device
+ * @domain: The domain.
+ * @dev: The device.
+ *
+ * The detach helper paired with iommu_attach_device_shared().
+ */
+void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_group *group;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return;
+
+	mutex_lock(&group->mutex);
+	if (WARN_ON(!group->attach_cnt || group->domain != domain ||
+		    (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
+		     group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER)))
+		goto unlock_out;
+
+	if (--group->attach_cnt == 0)
+		__iommu_detach_group(domain, group);
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_device_shared);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

The iommu_attach/detach_device() interfaces were exposed for the device
drivers to attach/detach their own domains. The commit <426a273834eae>
("iommu: Limit iommu_attach/detach_device to device with their own group")
restricted them to singleton groups to avoid different device in a group
attaching different domain.

As we've introduced device DMA ownership into the iommu core. We can now
introduce interfaces for muliple-device groups, and "all devices are in the
same address space" is still guaranteed.

The iommu_attach/detach_device_shared() could be used when multiple drivers
sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first
call of iommu_attach_device_shared() attaches the domain to the group.
Other drivers could join it later. The domain will be detached from the
group after all drivers unjoin it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
---
 include/linux/iommu.h | 13 +++++++
 drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5ad4cf13370d..1bc03118dfb3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner ow
 			      void *owner_cookie);
 void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
 bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev);
+void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+static inline int iommu_attach_device_shared(struct iommu_domain *domain,
+					     struct device *dev)
+{
+	return -ENODEV;
+}
+
 static inline void iommu_detach_device(struct iommu_domain *domain,
 				       struct device *dev)
 {
 }
 
+static inline void iommu_detach_device_shared(struct iommu_domain *domain,
+					      struct device *dev)
+{
+}
+
 static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 {
 	return NULL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8bec71b1cc18..3ad66cb9bedc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -50,6 +50,7 @@ struct iommu_group {
 	struct list_head entry;
 	enum iommu_dma_owner dma_owner;
 	unsigned int owner_cnt;
+	unsigned int attach_cnt;
 	void *owner_cookie;
 };
 
@@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own
 	iommu_group_put(group);
 }
 EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
+
+/**
+ * iommu_attach_device_shared() - Attach shared domain to a device
+ * @domain: The shared domain.
+ * @dev: The device.
+ *
+ * Similar to iommu_attach_device(), but allowed for shared-group devices
+ * and guarantees that all devices in an iommu group could only be attached
+ * by a same iommu domain. The caller should explicitly set the dma ownership
+ * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before
+ * calling it and use the paired helper iommu_detach_device_shared() for
+ * cleanup.
+ */
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_group *group;
+	int ret = 0;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
+	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		ret = -EPERM;
+		goto unlock_out;
+	}
+
+	if (group->attach_cnt) {
+		if (group->domain != domain) {
+			ret = -EBUSY;
+			goto unlock_out;
+		}
+	} else {
+		ret = __iommu_attach_group(domain, group);
+		if (ret)
+			goto unlock_out;
+	}
+
+	group->attach_cnt++;
+unlock_out:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_attach_device_shared);
+
+/**
+ * iommu_detach_device_shared() - Detach a domain from device
+ * @domain: The domain.
+ * @dev: The device.
+ *
+ * The detach helper paired with iommu_attach_device_shared().
+ */
+void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_group *group;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return;
+
+	mutex_lock(&group->mutex);
+	if (WARN_ON(!group->attach_cnt || group->domain != domain ||
+		    (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
+		     group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER)))
+		goto unlock_out;
+
+	if (--group->attach_cnt == 0)
+		__iommu_detach_group(domain, group);
+
+unlock_out:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_device_shared);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 08/13] vfio: Set DMA USER ownership for VFIO devices
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

Set DMA_OWNER_PRIVATE_DOMAIN_USER when an iommu group is set to a
container, and release DMA_OWNER_USER once the iommu group is unset
from a container.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |  1 +
 drivers/vfio/pci/vfio_pci.c           |  3 +++
 drivers/vfio/platform/vfio_amba.c     |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/vfio.c                   | 13 ++++++++++++-
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index 6e2e62c6f47a..b749d092a185 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -587,6 +587,7 @@ static struct fsl_mc_driver vfio_fsl_mc_driver = {
 	.driver	= {
 		.name	= "vfio-fsl-mc",
 		.owner	= THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a5ce92beb655..ce3e814b5506 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -193,6 +193,9 @@ static struct pci_driver vfio_pci_driver = {
 	.remove			= vfio_pci_remove,
 	.sriov_configure	= vfio_pci_sriov_configure,
 	.err_handler		= &vfio_pci_core_err_handlers,
+	.driver			= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static void __init vfio_pci_fill_ids(void)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index badfffea14fb..2146ee52901a 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -94,6 +94,7 @@ static struct amba_driver vfio_amba_driver = {
 	.drv = {
 		.name = "vfio-amba",
 		.owner = THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 68a1c87066d7..5ef06e668192 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -75,6 +75,7 @@ static struct platform_driver vfio_platform_driver = {
 	.remove		= vfio_platform_remove,
 	.driver	= {
 		.name	= "vfio-platform",
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 735d1d344af9..b75ba7551079 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1198,6 +1198,9 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 		driver->ops->detach_group(container->iommu_data,
 					  group->iommu_group);
 
+	iommu_group_release_dma_owner(group->iommu_group,
+				      DMA_OWNER_PRIVATE_DOMAIN_USER);
+
 	group->container = NULL;
 	wake_up(&group->container_q);
 	list_del(&group->container_next);
@@ -1282,13 +1285,21 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 		goto unlock_out;
 	}
 
+	ret = iommu_group_set_dma_owner(group->iommu_group,
+					DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);
+	if (ret)
+		goto unlock_out;
+
 	driver = container->iommu_driver;
 	if (driver) {
 		ret = driver->ops->attach_group(container->iommu_data,
 						group->iommu_group,
 						group->type);
-		if (ret)
+		if (ret) {
+			iommu_group_release_dma_owner(group->iommu_group,
+						      DMA_OWNER_PRIVATE_DOMAIN_USER);
 			goto unlock_out;
+		}
 	}
 
 	group->container = container;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 08/13] vfio: Set DMA USER ownership for VFIO devices
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

Set DMA_OWNER_PRIVATE_DOMAIN_USER when an iommu group is set to a
container, and release DMA_OWNER_USER once the iommu group is unset
from a container.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |  1 +
 drivers/vfio/pci/vfio_pci.c           |  3 +++
 drivers/vfio/platform/vfio_amba.c     |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/vfio.c                   | 13 ++++++++++++-
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index 6e2e62c6f47a..b749d092a185 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -587,6 +587,7 @@ static struct fsl_mc_driver vfio_fsl_mc_driver = {
 	.driver	= {
 		.name	= "vfio-fsl-mc",
 		.owner	= THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a5ce92beb655..ce3e814b5506 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -193,6 +193,9 @@ static struct pci_driver vfio_pci_driver = {
 	.remove			= vfio_pci_remove,
 	.sriov_configure	= vfio_pci_sriov_configure,
 	.err_handler		= &vfio_pci_core_err_handlers,
+	.driver			= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static void __init vfio_pci_fill_ids(void)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index badfffea14fb..2146ee52901a 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -94,6 +94,7 @@ static struct amba_driver vfio_amba_driver = {
 	.drv = {
 		.name = "vfio-amba",
 		.owner = THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 68a1c87066d7..5ef06e668192 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -75,6 +75,7 @@ static struct platform_driver vfio_platform_driver = {
 	.remove		= vfio_platform_remove,
 	.driver	= {
 		.name	= "vfio-platform",
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 735d1d344af9..b75ba7551079 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1198,6 +1198,9 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 		driver->ops->detach_group(container->iommu_data,
 					  group->iommu_group);
 
+	iommu_group_release_dma_owner(group->iommu_group,
+				      DMA_OWNER_PRIVATE_DOMAIN_USER);
+
 	group->container = NULL;
 	wake_up(&group->container_q);
 	list_del(&group->container_next);
@@ -1282,13 +1285,21 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 		goto unlock_out;
 	}
 
+	ret = iommu_group_set_dma_owner(group->iommu_group,
+					DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);
+	if (ret)
+		goto unlock_out;
+
 	driver = container->iommu_driver;
 	if (driver) {
 		ret = driver->ops->attach_group(container->iommu_data,
 						group->iommu_group,
 						group->type);
-		if (ret)
+		if (ret) {
+			iommu_group_release_dma_owner(group->iommu_group,
+						      DMA_OWNER_PRIVATE_DOMAIN_USER);
 			goto unlock_out;
+		}
 	}
 
 	group->container = container;
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 09/13] vfio: Remove use of vfio_group_viable()
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

As DMA USER ownership is claimed for the iommu group when a vfio group is
added to a vfio container, the vfio group viability is guaranteed as long
as group->container_users > 0. Remove those unnecessary group viability
checks which are only hit when group->container_users is not zero.

The only remaining reference is in GROUP_GET_STATUS, which could be called
at any time when group fd is valid. Here we just replace the
vfio_group_viable() by directly calling iommu core to get viability status.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b75ba7551079..241756b85705 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1316,12 +1316,6 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	return ret;
 }
 
-static bool vfio_group_viable(struct vfio_group *group)
-{
-	return (iommu_group_for_each_dev(group->iommu_group,
-					 group, vfio_dev_viable) == 0);
-}
-
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
 	if (!atomic_inc_not_zero(&group->container_users))
@@ -1331,7 +1325,7 @@ static int vfio_group_add_container_user(struct vfio_group *group)
 		atomic_dec(&group->container_users);
 		return -EPERM;
 	}
-	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+	if (!group->container->iommu_driver) {
 		atomic_dec(&group->container_users);
 		return -EINVAL;
 	}
@@ -1349,7 +1343,7 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	int ret = 0;
 
 	if (0 == atomic_read(&group->container_users) ||
-	    !group->container->iommu_driver || !vfio_group_viable(group))
+	    !group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
@@ -1441,11 +1435,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 		status.flags = 0;
 
-		if (vfio_group_viable(group))
-			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
-
 		if (group->container)
-			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET;
+			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
+					VFIO_GROUP_FLAGS_VIABLE;
+		else if (iommu_group_dma_owner_unclaimed(group->iommu_group))
+			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
 
 		if (copy_to_user((void __user *)arg, &status, minsz))
 			return -EFAULT;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 09/13] vfio: Remove use of vfio_group_viable()
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

As DMA USER ownership is claimed for the iommu group when a vfio group is
added to a vfio container, the vfio group viability is guaranteed as long
as group->container_users > 0. Remove those unnecessary group viability
checks which are only hit when group->container_users is not zero.

The only remaining reference is in GROUP_GET_STATUS, which could be called
at any time when group fd is valid. Here we just replace the
vfio_group_viable() by directly calling iommu core to get viability status.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b75ba7551079..241756b85705 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1316,12 +1316,6 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	return ret;
 }
 
-static bool vfio_group_viable(struct vfio_group *group)
-{
-	return (iommu_group_for_each_dev(group->iommu_group,
-					 group, vfio_dev_viable) == 0);
-}
-
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
 	if (!atomic_inc_not_zero(&group->container_users))
@@ -1331,7 +1325,7 @@ static int vfio_group_add_container_user(struct vfio_group *group)
 		atomic_dec(&group->container_users);
 		return -EPERM;
 	}
-	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+	if (!group->container->iommu_driver) {
 		atomic_dec(&group->container_users);
 		return -EINVAL;
 	}
@@ -1349,7 +1343,7 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	int ret = 0;
 
 	if (0 == atomic_read(&group->container_users) ||
-	    !group->container->iommu_driver || !vfio_group_viable(group))
+	    !group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
@@ -1441,11 +1435,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 		status.flags = 0;
 
-		if (vfio_group_viable(group))
-			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
-
 		if (group->container)
-			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET;
+			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
+					VFIO_GROUP_FLAGS_VIABLE;
+		else if (iommu_group_dma_owner_unclaimed(group->iommu_group))
+			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
 
 		if (copy_to_user((void __user *)arg, &status, minsz))
 			return -EFAULT;
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 10/13] vfio: Delete the unbound_list
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

From: Jason Gunthorpe <jgg@nvidia.com>

commit 60720a0fc646 ("vfio: Add device tracking during unbind") added the
unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL
relied on vfio_group_get_external_user() succeeding to return the
vfio_group from a group file descriptor. The unbound list allowed
vfio_group_get_external_user() to continue to succeed in edge cases.

However commit 5d6dee80a1e9 ("vfio: New external user group/file match")
deleted the call to vfio_group_get_external_user() during
KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used
to directly match the file descriptor to the group pointer.

This in turn avoids the call down to vfio_dev_viable() during
KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was
trying to fix.

There are no other users of vfio_dev_viable() that care about the time
after vfio_unregister_group_dev() returns, so simply delete the
unbound_list entirely.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 74 ++-------------------------------------------
 1 file changed, 2 insertions(+), 72 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 241756b85705..6426b29e73a2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,11 +62,6 @@ struct vfio_container {
 	bool				noiommu;
 };
 
-struct vfio_unbound_dev {
-	struct device			*dev;
-	struct list_head		unbound_next;
-};
-
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -79,8 +74,6 @@ struct vfio_group {
 	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	struct list_head		unbound_list;
-	struct mutex			unbound_lock;
 	atomic_t			opened;
 	wait_queue_head_t		container_q;
 	enum vfio_group_type		type;
@@ -340,16 +333,8 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 static void vfio_group_release(struct device *dev)
 {
 	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
-	struct vfio_unbound_dev *unbound, *tmp;
-
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
 
 	mutex_destroy(&group->device_lock);
-	mutex_destroy(&group->unbound_lock);
 	iommu_group_put(group->iommu_group);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
@@ -381,8 +366,6 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
-	INIT_LIST_HEAD(&group->unbound_list);
-	mutex_init(&group->unbound_lock);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
 	/* put in vfio_group_release() */
@@ -571,19 +554,8 @@ static int vfio_dev_viable(struct device *dev, void *data)
 	struct vfio_group *group = data;
 	struct vfio_device *device;
 	struct device_driver *drv = READ_ONCE(dev->driver);
-	struct vfio_unbound_dev *unbound;
-	int ret = -EINVAL;
 
-	mutex_lock(&group->unbound_lock);
-	list_for_each_entry(unbound, &group->unbound_list, unbound_next) {
-		if (dev == unbound->dev) {
-			ret = 0;
-			break;
-		}
-	}
-	mutex_unlock(&group->unbound_lock);
-
-	if (!ret || !drv || vfio_dev_driver_allowed(dev, drv))
+	if (!drv || vfio_dev_driver_allowed(dev, drv))
 		return 0;
 
 	device = vfio_group_get_device(group, dev);
@@ -592,7 +564,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 		return 0;
 	}
 
-	return ret;
+	return -EINVAL;
 }
 
 /*
@@ -634,7 +606,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 {
 	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
 	struct device *dev = data;
-	struct vfio_unbound_dev *unbound;
 
 	switch (action) {
 	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
@@ -663,28 +634,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 			__func__, iommu_group_id(group->iommu_group),
 			dev->driver->name);
 		break;
-	case IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbound from driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		/*
-		 * XXX An unbound device in a live group is ok, but we'd
-		 * really like to avoid the above BUG_ON by preventing other
-		 * drivers from binding to it.  Once that occurs, we have to
-		 * stop the system to maintain isolation.  At a minimum, we'd
-		 * want a toggle to disable driver auto probe for this device.
-		 */
-
-		mutex_lock(&group->unbound_lock);
-		list_for_each_entry(unbound,
-				    &group->unbound_list, unbound_next) {
-			if (dev == unbound->dev) {
-				list_del(&unbound->unbound_next);
-				kfree(unbound);
-				break;
-			}
-		}
-		mutex_unlock(&group->unbound_lock);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -889,29 +838,10 @@ static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 void vfio_unregister_group_dev(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
-	struct vfio_unbound_dev *unbound;
 	unsigned int i = 0;
 	bool interrupted = false;
 	long rc;
 
-	/*
-	 * When the device is removed from the group, the group suddenly
-	 * becomes non-viable; the device has a driver (until the unbind
-	 * completes), but it's not present in the group.  This is bad news
-	 * for any external users that need to re-acquire a group reference
-	 * in order to match and release their existing reference.  To
-	 * solve this, we track such devices on the unbound_list to bridge
-	 * the gap until they're fully unbound.
-	 */
-	unbound = kzalloc(sizeof(*unbound), GFP_KERNEL);
-	if (unbound) {
-		unbound->dev = device->dev;
-		mutex_lock(&group->unbound_lock);
-		list_add(&unbound->unbound_next, &group->unbound_list);
-		mutex_unlock(&group->unbound_lock);
-	}
-	WARN_ON(!unbound);
-
 	vfio_device_put(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 10/13] vfio: Delete the unbound_list
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

From: Jason Gunthorpe <jgg@nvidia.com>

commit 60720a0fc646 ("vfio: Add device tracking during unbind") added the
unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL
relied on vfio_group_get_external_user() succeeding to return the
vfio_group from a group file descriptor. The unbound list allowed
vfio_group_get_external_user() to continue to succeed in edge cases.

However commit 5d6dee80a1e9 ("vfio: New external user group/file match")
deleted the call to vfio_group_get_external_user() during
KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used
to directly match the file descriptor to the group pointer.

This in turn avoids the call down to vfio_dev_viable() during
KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was
trying to fix.

There are no other users of vfio_dev_viable() that care about the time
after vfio_unregister_group_dev() returns, so simply delete the
unbound_list entirely.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 74 ++-------------------------------------------
 1 file changed, 2 insertions(+), 72 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 241756b85705..6426b29e73a2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,11 +62,6 @@ struct vfio_container {
 	bool				noiommu;
 };
 
-struct vfio_unbound_dev {
-	struct device			*dev;
-	struct list_head		unbound_next;
-};
-
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -79,8 +74,6 @@ struct vfio_group {
 	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	struct list_head		unbound_list;
-	struct mutex			unbound_lock;
 	atomic_t			opened;
 	wait_queue_head_t		container_q;
 	enum vfio_group_type		type;
@@ -340,16 +333,8 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 static void vfio_group_release(struct device *dev)
 {
 	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
-	struct vfio_unbound_dev *unbound, *tmp;
-
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
 
 	mutex_destroy(&group->device_lock);
-	mutex_destroy(&group->unbound_lock);
 	iommu_group_put(group->iommu_group);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
@@ -381,8 +366,6 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
-	INIT_LIST_HEAD(&group->unbound_list);
-	mutex_init(&group->unbound_lock);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
 	/* put in vfio_group_release() */
@@ -571,19 +554,8 @@ static int vfio_dev_viable(struct device *dev, void *data)
 	struct vfio_group *group = data;
 	struct vfio_device *device;
 	struct device_driver *drv = READ_ONCE(dev->driver);
-	struct vfio_unbound_dev *unbound;
-	int ret = -EINVAL;
 
-	mutex_lock(&group->unbound_lock);
-	list_for_each_entry(unbound, &group->unbound_list, unbound_next) {
-		if (dev == unbound->dev) {
-			ret = 0;
-			break;
-		}
-	}
-	mutex_unlock(&group->unbound_lock);
-
-	if (!ret || !drv || vfio_dev_driver_allowed(dev, drv))
+	if (!drv || vfio_dev_driver_allowed(dev, drv))
 		return 0;
 
 	device = vfio_group_get_device(group, dev);
@@ -592,7 +564,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 		return 0;
 	}
 
-	return ret;
+	return -EINVAL;
 }
 
 /*
@@ -634,7 +606,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 {
 	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
 	struct device *dev = data;
-	struct vfio_unbound_dev *unbound;
 
 	switch (action) {
 	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
@@ -663,28 +634,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 			__func__, iommu_group_id(group->iommu_group),
 			dev->driver->name);
 		break;
-	case IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbound from driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		/*
-		 * XXX An unbound device in a live group is ok, but we'd
-		 * really like to avoid the above BUG_ON by preventing other
-		 * drivers from binding to it.  Once that occurs, we have to
-		 * stop the system to maintain isolation.  At a minimum, we'd
-		 * want a toggle to disable driver auto probe for this device.
-		 */
-
-		mutex_lock(&group->unbound_lock);
-		list_for_each_entry(unbound,
-				    &group->unbound_list, unbound_next) {
-			if (dev == unbound->dev) {
-				list_del(&unbound->unbound_next);
-				kfree(unbound);
-				break;
-			}
-		}
-		mutex_unlock(&group->unbound_lock);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -889,29 +838,10 @@ static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 void vfio_unregister_group_dev(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
-	struct vfio_unbound_dev *unbound;
 	unsigned int i = 0;
 	bool interrupted = false;
 	long rc;
 
-	/*
-	 * When the device is removed from the group, the group suddenly
-	 * becomes non-viable; the device has a driver (until the unbind
-	 * completes), but it's not present in the group.  This is bad news
-	 * for any external users that need to re-acquire a group reference
-	 * in order to match and release their existing reference.  To
-	 * solve this, we track such devices on the unbound_list to bridge
-	 * the gap until they're fully unbound.
-	 */
-	unbound = kzalloc(sizeof(*unbound), GFP_KERNEL);
-	if (unbound) {
-		unbound->dev = device->dev;
-		mutex_lock(&group->unbound_lock);
-		list_add(&unbound->unbound_next, &group->unbound_list);
-		mutex_unlock(&group->unbound_lock);
-	}
-	WARN_ON(!unbound);
-
 	vfio_device_put(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 11/13] vfio: Remove iommu group notifier
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 147 --------------------------------------------
 1 file changed, 147 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6426b29e73a2..539f5da9eb34 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -71,7 +71,6 @@ struct vfio_group {
 	struct vfio_container		*container;
 	struct list_head		device_list;
 	struct mutex			device_lock;
-	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
 	atomic_t			opened;
@@ -274,8 +273,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data);
 static void vfio_group_get(struct vfio_group *group);
 
 /*
@@ -395,13 +392,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 		goto err_put;
 	}
 
-	group->nb.notifier_call = vfio_iommu_group_notifier;
-	err = iommu_group_register_notifier(iommu_group, &group->nb);
-	if (err) {
-		ret = ERR_PTR(err);
-		goto err_put;
-	}
-
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
@@ -422,7 +412,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 err_unlock:
 	mutex_unlock(&vfio.group_lock);
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 err_put:
 	put_device(&group->dev);
 	return ret;
@@ -447,7 +436,6 @@ static void vfio_group_put(struct vfio_group *group)
 	cdev_device_del(&group->cdev, &group->dev);
 	mutex_unlock(&vfio.group_lock);
 
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 	put_device(&group->dev);
 }
 
@@ -503,141 +491,6 @@ static struct vfio_device *vfio_group_get_device(struct vfio_group *group,
 	return NULL;
 }
 
-/*
- * Some drivers, like pci-stub, are only used to prevent other drivers from
- * claiming a device and are therefore perfectly legitimate for a user owned
- * group.  The pci-stub driver has no dependencies on DMA or the IOVA mapping
- * of the device, but it does prevent the user from having direct access to
- * the device, which is useful in some circumstances.
- *
- * We also assume that we can include PCI interconnect devices, ie. bridges.
- * IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
- * then all of the downstream devices will be part of the same IOMMU group as
- * the bridge.  Thus, if placing the bridge into the user owned IOVA space
- * breaks anything, it only does so for user owned devices downstream.  Note
- * that error notification via MSI can be affected for platforms that handle
- * MSI within the same IOVA space as DMA.
- */
-static const char * const vfio_driver_allowed[] = { "pci-stub" };
-
-static bool vfio_dev_driver_allowed(struct device *dev,
-				    struct device_driver *drv)
-{
-	if (dev_is_pci(dev)) {
-		struct pci_dev *pdev = to_pci_dev(dev);
-
-		if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-			return true;
-	}
-
-	return match_string(vfio_driver_allowed,
-			    ARRAY_SIZE(vfio_driver_allowed),
-			    drv->name) >= 0;
-}
-
-/*
- * A vfio group is viable for use by userspace if all devices are in
- * one of the following states:
- *  - driver-less
- *  - bound to a vfio driver
- *  - bound to an otherwise allowed driver
- *  - a PCI interconnect device
- *
- * We use two methods to determine whether a device is bound to a vfio
- * driver.  The first is to test whether the device exists in the vfio
- * group.  The second is to test if the device exists on the group
- * unbound_list, indicating it's in the middle of transitioning from
- * a vfio driver to driver-less.
- */
-static int vfio_dev_viable(struct device *dev, void *data)
-{
-	struct vfio_group *group = data;
-	struct vfio_device *device;
-	struct device_driver *drv = READ_ONCE(dev->driver);
-
-	if (!drv || vfio_dev_driver_allowed(dev, drv))
-		return 0;
-
-	device = vfio_group_get_device(group, dev);
-	if (device) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/*
- * Async device support
- */
-static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
-{
-	struct vfio_device *device;
-
-	/* Do we already know about it?  We shouldn't */
-	device = vfio_group_get_device(group, dev);
-	if (WARN_ON_ONCE(device)) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	/* Nothing to do for idle groups */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	/* TODO Prevent device auto probing */
-	dev_WARN(dev, "Device added to live group %d!\n",
-		 iommu_group_id(group->iommu_group));
-
-	return 0;
-}
-
-static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
-{
-	/* We don't care what happens when the group isn't in use */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	return vfio_dev_viable(dev, group);
-}
-
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data)
-{
-	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
-	struct device *dev = data;
-
-	switch (action) {
-	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
-		vfio_group_nb_add_dev(group, dev);
-		break;
-	case IOMMU_GROUP_NOTIFY_DEL_DEVICE:
-		/*
-		 * Nothing to do here.  If the device is in use, then the
-		 * vfio sub-driver should block the remove callback until
-		 * it is unused.  If the device is unused or attached to a
-		 * stub driver, then it should be released and we don't
-		 * care that it will be going away.
-		 */
-		break;
-	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		dev_dbg(dev, "%s: group %d binding to driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		break;
-	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d bound to driver %s\n", __func__,
-			iommu_group_id(group->iommu_group), dev->driver->name);
-		BUG_ON(vfio_group_nb_verify(group, dev));
-		break;
-	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbinding from driver %s\n",
-			__func__, iommu_group_id(group->iommu_group),
-			dev->driver->name);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
 /*
  * VFIO driver API
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 11/13] vfio: Remove iommu group notifier
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 147 --------------------------------------------
 1 file changed, 147 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6426b29e73a2..539f5da9eb34 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -71,7 +71,6 @@ struct vfio_group {
 	struct vfio_container		*container;
 	struct list_head		device_list;
 	struct mutex			device_lock;
-	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
 	atomic_t			opened;
@@ -274,8 +273,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data);
 static void vfio_group_get(struct vfio_group *group);
 
 /*
@@ -395,13 +392,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 		goto err_put;
 	}
 
-	group->nb.notifier_call = vfio_iommu_group_notifier;
-	err = iommu_group_register_notifier(iommu_group, &group->nb);
-	if (err) {
-		ret = ERR_PTR(err);
-		goto err_put;
-	}
-
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
@@ -422,7 +412,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 err_unlock:
 	mutex_unlock(&vfio.group_lock);
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 err_put:
 	put_device(&group->dev);
 	return ret;
@@ -447,7 +436,6 @@ static void vfio_group_put(struct vfio_group *group)
 	cdev_device_del(&group->cdev, &group->dev);
 	mutex_unlock(&vfio.group_lock);
 
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 	put_device(&group->dev);
 }
 
@@ -503,141 +491,6 @@ static struct vfio_device *vfio_group_get_device(struct vfio_group *group,
 	return NULL;
 }
 
-/*
- * Some drivers, like pci-stub, are only used to prevent other drivers from
- * claiming a device and are therefore perfectly legitimate for a user owned
- * group.  The pci-stub driver has no dependencies on DMA or the IOVA mapping
- * of the device, but it does prevent the user from having direct access to
- * the device, which is useful in some circumstances.
- *
- * We also assume that we can include PCI interconnect devices, ie. bridges.
- * IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
- * then all of the downstream devices will be part of the same IOMMU group as
- * the bridge.  Thus, if placing the bridge into the user owned IOVA space
- * breaks anything, it only does so for user owned devices downstream.  Note
- * that error notification via MSI can be affected for platforms that handle
- * MSI within the same IOVA space as DMA.
- */
-static const char * const vfio_driver_allowed[] = { "pci-stub" };
-
-static bool vfio_dev_driver_allowed(struct device *dev,
-				    struct device_driver *drv)
-{
-	if (dev_is_pci(dev)) {
-		struct pci_dev *pdev = to_pci_dev(dev);
-
-		if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-			return true;
-	}
-
-	return match_string(vfio_driver_allowed,
-			    ARRAY_SIZE(vfio_driver_allowed),
-			    drv->name) >= 0;
-}
-
-/*
- * A vfio group is viable for use by userspace if all devices are in
- * one of the following states:
- *  - driver-less
- *  - bound to a vfio driver
- *  - bound to an otherwise allowed driver
- *  - a PCI interconnect device
- *
- * We use two methods to determine whether a device is bound to a vfio
- * driver.  The first is to test whether the device exists in the vfio
- * group.  The second is to test if the device exists on the group
- * unbound_list, indicating it's in the middle of transitioning from
- * a vfio driver to driver-less.
- */
-static int vfio_dev_viable(struct device *dev, void *data)
-{
-	struct vfio_group *group = data;
-	struct vfio_device *device;
-	struct device_driver *drv = READ_ONCE(dev->driver);
-
-	if (!drv || vfio_dev_driver_allowed(dev, drv))
-		return 0;
-
-	device = vfio_group_get_device(group, dev);
-	if (device) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/*
- * Async device support
- */
-static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
-{
-	struct vfio_device *device;
-
-	/* Do we already know about it?  We shouldn't */
-	device = vfio_group_get_device(group, dev);
-	if (WARN_ON_ONCE(device)) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	/* Nothing to do for idle groups */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	/* TODO Prevent device auto probing */
-	dev_WARN(dev, "Device added to live group %d!\n",
-		 iommu_group_id(group->iommu_group));
-
-	return 0;
-}
-
-static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
-{
-	/* We don't care what happens when the group isn't in use */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	return vfio_dev_viable(dev, group);
-}
-
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data)
-{
-	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
-	struct device *dev = data;
-
-	switch (action) {
-	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
-		vfio_group_nb_add_dev(group, dev);
-		break;
-	case IOMMU_GROUP_NOTIFY_DEL_DEVICE:
-		/*
-		 * Nothing to do here.  If the device is in use, then the
-		 * vfio sub-driver should block the remove callback until
-		 * it is unused.  If the device is unused or attached to a
-		 * stub driver, then it should be released and we don't
-		 * care that it will be going away.
-		 */
-		break;
-	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		dev_dbg(dev, "%s: group %d binding to driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		break;
-	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d bound to driver %s\n", __func__,
-			iommu_group_id(group->iommu_group), dev->driver->name);
-		BUG_ON(vfio_group_nb_verify(group, dev));
-		break;
-	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbinding from driver %s\n",
-			__func__, iommu_group_id(group->iommu_group),
-			dev->driver->name);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
 /*
  * VFIO driver API
  */
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 12/13] iommu: Remove iommu group changes notifier
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu, Christoph Hellwig

The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 23 -------------
 drivers/iommu/iommu.c | 75 -------------------------------------------
 2 files changed, 98 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1bc03118dfb3..860ac545ac77 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -417,13 +417,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 	};
 }
 
-#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
-#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
-#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
-#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
-#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
-#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
-
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
@@ -496,10 +489,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
 extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_group_register_notifier(struct iommu_group *group,
-					 struct notifier_block *nb);
-extern int iommu_group_unregister_notifier(struct iommu_group *group,
-					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data);
@@ -913,18 +902,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline int iommu_group_register_notifier(struct iommu_group *group,
-						struct notifier_block *nb)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_group_unregister_notifier(struct iommu_group *group,
-						  struct notifier_block *nb)
-{
-	return 0;
-}
-
 static inline
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ad66cb9bedc..053f6ea54fbd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -18,7 +18,6 @@
 #include <linux/errno.h>
 #include <linux/iommu.h>
 #include <linux/idr.h>
-#include <linux/notifier.h>
 #include <linux/err.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
@@ -40,7 +39,6 @@ struct iommu_group {
 	struct kobject *devices_kobj;
 	struct list_head devices;
 	struct mutex mutex;
-	struct blocking_notifier_head notifier;
 	void *iommu_data;
 	void (*iommu_data_release)(void *iommu_data);
 	char *name;
@@ -629,7 +627,6 @@ struct iommu_group *iommu_group_alloc(void)
 	mutex_init(&group->mutex);
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
-	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -904,10 +901,6 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	if (ret)
 		goto err_put_group;
 
-	/* Notify any listeners about change to group. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
-
 	trace_add_device_to_group(group->id, dev);
 
 	dev_info(dev, "Adding to iommu group %d\n", group->id);
@@ -949,10 +942,6 @@ void iommu_group_remove_device(struct device *dev)
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	/* Pre-notify listeners that a device is being removed. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
-
 	mutex_lock(&group->mutex);
 	list_for_each_entry(tmp_device, &group->devices, list) {
 		if (tmp_device->dev == dev) {
@@ -1075,36 +1064,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_group_register_notifier - Register a notifier for group changes
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * This function allows iommu group users to track changes in a group.
- * See include/linux/iommu.h for actions sent via this notifier.  Caller
- * should hold a reference to the group throughout notifier registration.
- */
-int iommu_group_register_notifier(struct iommu_group *group,
-				  struct notifier_block *nb)
-{
-	return blocking_notifier_chain_register(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
-
-/**
- * iommu_group_unregister_notifier - Unregister a notifier
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * Unregister a previously registered group notifier block.
- */
-int iommu_group_unregister_notifier(struct iommu_group *group,
-				    struct notifier_block *nb)
-{
-	return blocking_notifier_chain_unregister(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
-
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
@@ -1653,14 +1612,8 @@ static int remove_iommu_group(struct device *dev, void *data)
 static int iommu_bus_notifier(struct notifier_block *nb,
 			      unsigned long action, void *data)
 {
-	unsigned long group_action = 0;
 	struct device *dev = data;
-	struct iommu_group *group;
 
-	/*
-	 * ADD/DEL call into iommu driver ops if provided, which may
-	 * result in ADD/DEL notifiers to group->notifier
-	 */
 	if (action == BUS_NOTIFY_ADD_DEVICE) {
 		int ret;
 
@@ -1671,34 +1624,6 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 		return NOTIFY_OK;
 	}
 
-	/*
-	 * Remaining BUS_NOTIFYs get filtered and republished to the
-	 * group, if anyone is listening
-	 */
-	group = iommu_group_get(dev);
-	if (!group)
-		return 0;
-
-	switch (action) {
-	case BUS_NOTIFY_BIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
-		break;
-	case BUS_NOTIFY_BOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
-		break;
-	}
-
-	if (group_action)
-		blocking_notifier_call_chain(&group->notifier,
-					     group_action, dev);
-
-	iommu_group_put(group);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 12/13] iommu: Remove iommu group changes notifier
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Christoph Hellwig,
	Stuart Yoder, Jonathan Hunter, Chaitanya Kulkarni, Dan Williams,
	Cornelia Huck, linux-kernel, Li Yang, iommu, Jacob jun Pan,
	Daniel Vetter, Robin Murphy

The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 23 -------------
 drivers/iommu/iommu.c | 75 -------------------------------------------
 2 files changed, 98 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1bc03118dfb3..860ac545ac77 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -417,13 +417,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 	};
 }
 
-#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
-#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
-#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
-#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
-#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
-#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
-
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
@@ -496,10 +489,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
 extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_group_register_notifier(struct iommu_group *group,
-					 struct notifier_block *nb);
-extern int iommu_group_unregister_notifier(struct iommu_group *group,
-					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data);
@@ -913,18 +902,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline int iommu_group_register_notifier(struct iommu_group *group,
-						struct notifier_block *nb)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_group_unregister_notifier(struct iommu_group *group,
-						  struct notifier_block *nb)
-{
-	return 0;
-}
-
 static inline
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ad66cb9bedc..053f6ea54fbd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -18,7 +18,6 @@
 #include <linux/errno.h>
 #include <linux/iommu.h>
 #include <linux/idr.h>
-#include <linux/notifier.h>
 #include <linux/err.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
@@ -40,7 +39,6 @@ struct iommu_group {
 	struct kobject *devices_kobj;
 	struct list_head devices;
 	struct mutex mutex;
-	struct blocking_notifier_head notifier;
 	void *iommu_data;
 	void (*iommu_data_release)(void *iommu_data);
 	char *name;
@@ -629,7 +627,6 @@ struct iommu_group *iommu_group_alloc(void)
 	mutex_init(&group->mutex);
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
-	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -904,10 +901,6 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	if (ret)
 		goto err_put_group;
 
-	/* Notify any listeners about change to group. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
-
 	trace_add_device_to_group(group->id, dev);
 
 	dev_info(dev, "Adding to iommu group %d\n", group->id);
@@ -949,10 +942,6 @@ void iommu_group_remove_device(struct device *dev)
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	/* Pre-notify listeners that a device is being removed. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
-
 	mutex_lock(&group->mutex);
 	list_for_each_entry(tmp_device, &group->devices, list) {
 		if (tmp_device->dev == dev) {
@@ -1075,36 +1064,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_group_register_notifier - Register a notifier for group changes
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * This function allows iommu group users to track changes in a group.
- * See include/linux/iommu.h for actions sent via this notifier.  Caller
- * should hold a reference to the group throughout notifier registration.
- */
-int iommu_group_register_notifier(struct iommu_group *group,
-				  struct notifier_block *nb)
-{
-	return blocking_notifier_chain_register(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
-
-/**
- * iommu_group_unregister_notifier - Unregister a notifier
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * Unregister a previously registered group notifier block.
- */
-int iommu_group_unregister_notifier(struct iommu_group *group,
-				    struct notifier_block *nb)
-{
-	return blocking_notifier_chain_unregister(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
-
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
@@ -1653,14 +1612,8 @@ static int remove_iommu_group(struct device *dev, void *data)
 static int iommu_bus_notifier(struct notifier_block *nb,
 			      unsigned long action, void *data)
 {
-	unsigned long group_action = 0;
 	struct device *dev = data;
-	struct iommu_group *group;
 
-	/*
-	 * ADD/DEL call into iommu driver ops if provided, which may
-	 * result in ADD/DEL notifiers to group->notifier
-	 */
 	if (action == BUS_NOTIFY_ADD_DEVICE) {
 		int ret;
 
@@ -1671,34 +1624,6 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 		return NOTIFY_OK;
 	}
 
-	/*
-	 * Remaining BUS_NOTIFYs get filtered and republished to the
-	 * group, if anyone is listening
-	 */
-	group = iommu_group_get(dev);
-	if (!group)
-		return 0;
-
-	switch (action) {
-	case BUS_NOTIFY_BIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
-		break;
-	case BUS_NOTIFY_BOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
-		break;
-	}
-
-	if (group_action)
-		blocking_notifier_call_chain(&group->notifier,
-					     group_action, dev);
-
-	iommu_group_put(group);
 	return 0;
 }
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 13/13] drm/tegra: Use the iommu dma_owner mechanism
  2021-12-17  6:36 ` Lu Baolu
@ 2021-12-17  6:37   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel,
	Lu Baolu

From: Jason Gunthorpe <jgg@nvidia.com>

Tegra joins many platform devices onto the same iommu_domain and builds
sort-of a DMA API on top of it.

Given that iommu_attach/detatch_device_shared() has supported this usage
model. Each device that wants to use the special domain will use
suppress_auto_claim_dma_owner and call iommu_attach_device_shared() which
will use dma owner framework to lock out other usages of the group and
refcount the domain attachment.

When the last device calls detatch the domain will be disconnected.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com> # Nexus7 T30
---
 drivers/gpu/drm/tegra/dc.c   |  1 +
 drivers/gpu/drm/tegra/drm.c  | 54 +++++++++++++++++-------------------
 drivers/gpu/drm/tegra/gr2d.c |  1 +
 drivers/gpu/drm/tegra/gr3d.c |  1 +
 drivers/gpu/drm/tegra/vic.c  |  3 +-
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index a29d64f87563..8fd7a083cc44 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -3108,6 +3108,7 @@ struct platform_driver tegra_dc_driver = {
 	.driver = {
 		.name = "tegra-dc",
 		.of_match_table = tegra_dc_of_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = tegra_dc_probe,
 	.remove = tegra_dc_remove,
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 8d37d6b00562..8a5fd390f85f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -928,12 +928,15 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	return 0;
 }
 
+/*
+ * Clients which use this function must set suppress_auto_claim_dma_owner in
+ * their platform_driver's device_driver struct.
+ */
 int host1x_client_iommu_attach(struct host1x_client *client)
 {
 	struct iommu_domain *domain = iommu_get_domain_for_dev(client->dev);
 	struct drm_device *drm = dev_get_drvdata(client->host);
 	struct tegra_drm *tegra = drm->dev_private;
-	struct iommu_group *group = NULL;
 	int err;
 
 	/*
@@ -941,48 +944,41 @@ int host1x_client_iommu_attach(struct host1x_client *client)
 	 * not the shared IOMMU domain, don't try to attach it to a different
 	 * domain. This allows using the IOMMU-backed DMA API.
 	 */
-	if (domain && domain != tegra->domain)
-		return 0;
+	client->group = NULL;
+	if (!client->dev->iommu_group || (domain && domain != tegra->domain))
+		return iommu_device_set_dma_owner(client->dev,
+						  DMA_OWNER_DMA_API, NULL);
 
-	if (tegra->domain) {
-		group = iommu_group_get(client->dev);
-		if (!group)
-			return -ENODEV;
-
-		if (domain != tegra->domain) {
-			err = iommu_attach_group(tegra->domain, group);
-			if (err < 0) {
-				iommu_group_put(group);
-				return err;
-			}
-		}
+	err = iommu_device_set_dma_owner(client->dev,
+					 DMA_OWNER_PRIVATE_DOMAIN, NULL);
+	if (err)
+		return err;
 
-		tegra->use_explicit_iommu = true;
+	err = iommu_attach_device_shared(tegra->domain, client->dev);
+	if (err) {
+		iommu_device_release_dma_owner(client->dev,
+					       DMA_OWNER_PRIVATE_DOMAIN);
+		return err;
 	}
 
-	client->group = group;
-
+	tegra->use_explicit_iommu = true;
+	client->group = client->dev->iommu_group;
 	return 0;
 }
 
 void host1x_client_iommu_detach(struct host1x_client *client)
 {
+	struct iommu_domain *domain = iommu_get_domain_for_dev(client->dev);
 	struct drm_device *drm = dev_get_drvdata(client->host);
 	struct tegra_drm *tegra = drm->dev_private;
-	struct iommu_domain *domain;
 
 	if (client->group) {
-		/*
-		 * Devices that are part of the same group may no longer be
-		 * attached to a domain at this point because their group may
-		 * have been detached by an earlier client.
-		 */
-		domain = iommu_get_domain_for_dev(client->dev);
-		if (domain)
-			iommu_detach_group(tegra->domain, client->group);
-
-		iommu_group_put(client->group);
+		iommu_detach_device_shared(tegra->domain, client->dev);
+		iommu_device_release_dma_owner(client->dev,
+					       DMA_OWNER_PRIVATE_DOMAIN);
 		client->group = NULL;
+	} else {
+		iommu_device_release_dma_owner(client->dev, DMA_OWNER_DMA_API);
 	}
 }
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index de288cba3905..2e8bb9342da2 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -268,6 +268,7 @@ struct platform_driver tegra_gr2d_driver = {
 	.driver = {
 		.name = "tegra-gr2d",
 		.of_match_table = gr2d_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = gr2d_probe,
 	.remove = gr2d_remove,
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index 24442ade0da3..20133ac59e78 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -397,6 +397,7 @@ struct platform_driver tegra_gr3d_driver = {
 	.driver = {
 		.name = "tegra-gr3d",
 		.of_match_table = tegra_gr3d_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = gr3d_probe,
 	.remove = gr3d_remove,
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index c02010ff2b7f..b4f65574e36f 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -523,7 +523,8 @@ struct platform_driver tegra_vic_driver = {
 	.driver = {
 		.name = "tegra-vic",
 		.of_match_table = tegra_vic_of_match,
-		.pm = &vic_pm_ops
+		.pm = &vic_pm_ops,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = vic_probe,
 	.remove = vic_remove,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 13/13] drm/tegra: Use the iommu dma_owner mechanism
@ 2021-12-17  6:37   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-17  6:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

From: Jason Gunthorpe <jgg@nvidia.com>

Tegra joins many platform devices onto the same iommu_domain and builds
sort-of a DMA API on top of it.

Given that iommu_attach/detatch_device_shared() has supported this usage
model. Each device that wants to use the special domain will use
suppress_auto_claim_dma_owner and call iommu_attach_device_shared() which
will use dma owner framework to lock out other usages of the group and
refcount the domain attachment.

When the last device calls detatch the domain will be disconnected.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com> # Nexus7 T30
---
 drivers/gpu/drm/tegra/dc.c   |  1 +
 drivers/gpu/drm/tegra/drm.c  | 54 +++++++++++++++++-------------------
 drivers/gpu/drm/tegra/gr2d.c |  1 +
 drivers/gpu/drm/tegra/gr3d.c |  1 +
 drivers/gpu/drm/tegra/vic.c  |  3 +-
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index a29d64f87563..8fd7a083cc44 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -3108,6 +3108,7 @@ struct platform_driver tegra_dc_driver = {
 	.driver = {
 		.name = "tegra-dc",
 		.of_match_table = tegra_dc_of_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = tegra_dc_probe,
 	.remove = tegra_dc_remove,
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 8d37d6b00562..8a5fd390f85f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -928,12 +928,15 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	return 0;
 }
 
+/*
+ * Clients which use this function must set suppress_auto_claim_dma_owner in
+ * their platform_driver's device_driver struct.
+ */
 int host1x_client_iommu_attach(struct host1x_client *client)
 {
 	struct iommu_domain *domain = iommu_get_domain_for_dev(client->dev);
 	struct drm_device *drm = dev_get_drvdata(client->host);
 	struct tegra_drm *tegra = drm->dev_private;
-	struct iommu_group *group = NULL;
 	int err;
 
 	/*
@@ -941,48 +944,41 @@ int host1x_client_iommu_attach(struct host1x_client *client)
 	 * not the shared IOMMU domain, don't try to attach it to a different
 	 * domain. This allows using the IOMMU-backed DMA API.
 	 */
-	if (domain && domain != tegra->domain)
-		return 0;
+	client->group = NULL;
+	if (!client->dev->iommu_group || (domain && domain != tegra->domain))
+		return iommu_device_set_dma_owner(client->dev,
+						  DMA_OWNER_DMA_API, NULL);
 
-	if (tegra->domain) {
-		group = iommu_group_get(client->dev);
-		if (!group)
-			return -ENODEV;
-
-		if (domain != tegra->domain) {
-			err = iommu_attach_group(tegra->domain, group);
-			if (err < 0) {
-				iommu_group_put(group);
-				return err;
-			}
-		}
+	err = iommu_device_set_dma_owner(client->dev,
+					 DMA_OWNER_PRIVATE_DOMAIN, NULL);
+	if (err)
+		return err;
 
-		tegra->use_explicit_iommu = true;
+	err = iommu_attach_device_shared(tegra->domain, client->dev);
+	if (err) {
+		iommu_device_release_dma_owner(client->dev,
+					       DMA_OWNER_PRIVATE_DOMAIN);
+		return err;
 	}
 
-	client->group = group;
-
+	tegra->use_explicit_iommu = true;
+	client->group = client->dev->iommu_group;
 	return 0;
 }
 
 void host1x_client_iommu_detach(struct host1x_client *client)
 {
+	struct iommu_domain *domain = iommu_get_domain_for_dev(client->dev);
 	struct drm_device *drm = dev_get_drvdata(client->host);
 	struct tegra_drm *tegra = drm->dev_private;
-	struct iommu_domain *domain;
 
 	if (client->group) {
-		/*
-		 * Devices that are part of the same group may no longer be
-		 * attached to a domain at this point because their group may
-		 * have been detached by an earlier client.
-		 */
-		domain = iommu_get_domain_for_dev(client->dev);
-		if (domain)
-			iommu_detach_group(tegra->domain, client->group);
-
-		iommu_group_put(client->group);
+		iommu_detach_device_shared(tegra->domain, client->dev);
+		iommu_device_release_dma_owner(client->dev,
+					       DMA_OWNER_PRIVATE_DOMAIN);
 		client->group = NULL;
+	} else {
+		iommu_device_release_dma_owner(client->dev, DMA_OWNER_DMA_API);
 	}
 }
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index de288cba3905..2e8bb9342da2 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -268,6 +268,7 @@ struct platform_driver tegra_gr2d_driver = {
 	.driver = {
 		.name = "tegra-gr2d",
 		.of_match_table = gr2d_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = gr2d_probe,
 	.remove = gr2d_remove,
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index 24442ade0da3..20133ac59e78 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -397,6 +397,7 @@ struct platform_driver tegra_gr3d_driver = {
 	.driver = {
 		.name = "tegra-gr3d",
 		.of_match_table = tegra_gr3d_match,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = gr3d_probe,
 	.remove = gr3d_remove,
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index c02010ff2b7f..b4f65574e36f 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -523,7 +523,8 @@ struct platform_driver tegra_vic_driver = {
 	.driver = {
 		.name = "tegra-vic",
 		.of_match_table = tegra_vic_of_match,
-		.pm = &vic_pm_ops
+		.pm = &vic_pm_ops,
+		.suppress_auto_claim_dma_owner = true,
 	},
 	.probe = vic_probe,
 	.remove = vic_remove,
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-17  6:37   ` Lu Baolu
@ 2021-12-21 16:50     ` Robin Murphy
  -1 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2021-12-21 16:50 UTC (permalink / raw)
  To: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj
  Cc: Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On 2021-12-17 06:37, Lu Baolu wrote:
> The iommu_attach/detach_device() interfaces were exposed for the device
> drivers to attach/detach their own domains. The commit <426a273834eae>
> ("iommu: Limit iommu_attach/detach_device to device with their own group")
> restricted them to singleton groups to avoid different device in a group
> attaching different domain.
> 
> As we've introduced device DMA ownership into the iommu core. We can now
> introduce interfaces for muliple-device groups, and "all devices are in the
> same address space" is still guaranteed.
> 
> The iommu_attach/detach_device_shared() could be used when multiple drivers
> sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first
> call of iommu_attach_device_shared() attaches the domain to the group.
> Other drivers could join it later. The domain will be detached from the
> group after all drivers unjoin it.

I don't see the point of this at all - if you really want to hide the 
concept of IOMMU groups away from drivers then just make 
iommu_{attach,detach}_device() do the right thing. At least the 
iommu_group_get_for_dev() plus iommu_{attach,detach}_group() API is 
clear - this proposal is the worst of both worlds, in that drivers still 
have to be just as aware of groups in order to know whether to call the 
_shared interface or not, except it's now entirely implicit and non-obvious.

Otherwise just add the housekeeping stuff to 
iommu_{attach,detach}_group() - there's no way we want *three* 
attach/detach interfaces all with different semantics.

It's worth taking a step back and realising that overall, this is really 
just a more generalised and finer-grained extension of what 426a273834ea 
already did for non-group-aware code, so it makes little sense *not* to 
integrate it into the existing interfaces.

Robin.

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Tested-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>   include/linux/iommu.h | 13 +++++++
>   drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 92 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5ad4cf13370d..1bc03118dfb3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner ow
>   			      void *owner_cookie);
>   void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
>   bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
> +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev);
> +void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev);
>   
>   #else /* CONFIG_IOMMU_API */
>   
> @@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain,
>   	return -ENODEV;
>   }
>   
> +static inline int iommu_attach_device_shared(struct iommu_domain *domain,
> +					     struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
>   static inline void iommu_detach_device(struct iommu_domain *domain,
>   				       struct device *dev)
>   {
>   }
>   
> +static inline void iommu_detach_device_shared(struct iommu_domain *domain,
> +					      struct device *dev)
> +{
> +}
> +
>   static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>   {
>   	return NULL;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8bec71b1cc18..3ad66cb9bedc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -50,6 +50,7 @@ struct iommu_group {
>   	struct list_head entry;
>   	enum iommu_dma_owner dma_owner;
>   	unsigned int owner_cnt;
> +	unsigned int attach_cnt;
>   	void *owner_cookie;
>   };
>   
> @@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own
>   	iommu_group_put(group);
>   }
>   EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
> +
> +/**
> + * iommu_attach_device_shared() - Attach shared domain to a device
> + * @domain: The shared domain.
> + * @dev: The device.
> + *
> + * Similar to iommu_attach_device(), but allowed for shared-group devices
> + * and guarantees that all devices in an iommu group could only be attached
> + * by a same iommu domain. The caller should explicitly set the dma ownership
> + * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before
> + * calling it and use the paired helper iommu_detach_device_shared() for
> + * cleanup.
> + */
> +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct iommu_group *group;
> +	int ret = 0;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
> +	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
> +		ret = -EPERM;
> +		goto unlock_out;
> +	}
> +
> +	if (group->attach_cnt) {
> +		if (group->domain != domain) {
> +			ret = -EBUSY;
> +			goto unlock_out;
> +		}
> +	} else {
> +		ret = __iommu_attach_group(domain, group);
> +		if (ret)
> +			goto unlock_out;
> +	}
> +
> +	group->attach_cnt++;
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_device_shared);
> +
> +/**
> + * iommu_detach_device_shared() - Detach a domain from device
> + * @domain: The domain.
> + * @dev: The device.
> + *
> + * The detach helper paired with iommu_attach_device_shared().
> + */
> +void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return;
> +
> +	mutex_lock(&group->mutex);
> +	if (WARN_ON(!group->attach_cnt || group->domain != domain ||
> +		    (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
> +		     group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER)))
> +		goto unlock_out;
> +
> +	if (--group->attach_cnt == 0)
> +		__iommu_detach_group(domain, group);
> +
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_device_shared);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-21 16:50     ` Robin Murphy
  0 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2021-12-21 16:50 UTC (permalink / raw)
  To: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj
  Cc: Chaitanya Kulkarni, kvm, Stuart Yoder, rafael, David Airlie,
	linux-pci, Cornelia Huck, linux-kernel, Jonathan Hunter, iommu,
	Thierry Reding, Jacob jun Pan, Daniel Vetter, Diana Craciun,
	Dan Williams, Li Yang, Will Deacon, Dmitry Osipenko

On 2021-12-17 06:37, Lu Baolu wrote:
> The iommu_attach/detach_device() interfaces were exposed for the device
> drivers to attach/detach their own domains. The commit <426a273834eae>
> ("iommu: Limit iommu_attach/detach_device to device with their own group")
> restricted them to singleton groups to avoid different device in a group
> attaching different domain.
> 
> As we've introduced device DMA ownership into the iommu core. We can now
> introduce interfaces for muliple-device groups, and "all devices are in the
> same address space" is still guaranteed.
> 
> The iommu_attach/detach_device_shared() could be used when multiple drivers
> sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first
> call of iommu_attach_device_shared() attaches the domain to the group.
> Other drivers could join it later. The domain will be detached from the
> group after all drivers unjoin it.

I don't see the point of this at all - if you really want to hide the 
concept of IOMMU groups away from drivers then just make 
iommu_{attach,detach}_device() do the right thing. At least the 
iommu_group_get_for_dev() plus iommu_{attach,detach}_group() API is 
clear - this proposal is the worst of both worlds, in that drivers still 
have to be just as aware of groups in order to know whether to call the 
_shared interface or not, except it's now entirely implicit and non-obvious.

Otherwise just add the housekeeping stuff to 
iommu_{attach,detach}_group() - there's no way we want *three* 
attach/detach interfaces all with different semantics.

It's worth taking a step back and realising that overall, this is really 
just a more generalised and finer-grained extension of what 426a273834ea 
already did for non-group-aware code, so it makes little sense *not* to 
integrate it into the existing interfaces.

Robin.

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Tested-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>   include/linux/iommu.h | 13 +++++++
>   drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 92 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5ad4cf13370d..1bc03118dfb3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner ow
>   			      void *owner_cookie);
>   void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
>   bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
> +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev);
> +void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev);
>   
>   #else /* CONFIG_IOMMU_API */
>   
> @@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain,
>   	return -ENODEV;
>   }
>   
> +static inline int iommu_attach_device_shared(struct iommu_domain *domain,
> +					     struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +
>   static inline void iommu_detach_device(struct iommu_domain *domain,
>   				       struct device *dev)
>   {
>   }
>   
> +static inline void iommu_detach_device_shared(struct iommu_domain *domain,
> +					      struct device *dev)
> +{
> +}
> +
>   static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>   {
>   	return NULL;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8bec71b1cc18..3ad66cb9bedc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -50,6 +50,7 @@ struct iommu_group {
>   	struct list_head entry;
>   	enum iommu_dma_owner dma_owner;
>   	unsigned int owner_cnt;
> +	unsigned int attach_cnt;
>   	void *owner_cookie;
>   };
>   
> @@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own
>   	iommu_group_put(group);
>   }
>   EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
> +
> +/**
> + * iommu_attach_device_shared() - Attach shared domain to a device
> + * @domain: The shared domain.
> + * @dev: The device.
> + *
> + * Similar to iommu_attach_device(), but allowed for shared-group devices
> + * and guarantees that all devices in an iommu group could only be attached
> + * by a same iommu domain. The caller should explicitly set the dma ownership
> + * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before
> + * calling it and use the paired helper iommu_detach_device_shared() for
> + * cleanup.
> + */
> +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct iommu_group *group;
> +	int ret = 0;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return -ENODEV;
> +
> +	mutex_lock(&group->mutex);
> +	if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
> +	    group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
> +		ret = -EPERM;
> +		goto unlock_out;
> +	}
> +
> +	if (group->attach_cnt) {
> +		if (group->domain != domain) {
> +			ret = -EBUSY;
> +			goto unlock_out;
> +		}
> +	} else {
> +		ret = __iommu_attach_group(domain, group);
> +		if (ret)
> +			goto unlock_out;
> +	}
> +
> +	group->attach_cnt++;
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_device_shared);
> +
> +/**
> + * iommu_detach_device_shared() - Detach a domain from device
> + * @domain: The domain.
> + * @dev: The device.
> + *
> + * The detach helper paired with iommu_attach_device_shared().
> + */
> +void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct iommu_group *group;
> +
> +	group = iommu_group_get(dev);
> +	if (!group)
> +		return;
> +
> +	mutex_lock(&group->mutex);
> +	if (WARN_ON(!group->attach_cnt || group->domain != domain ||
> +		    (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
> +		     group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER)))
> +		goto unlock_out;
> +
> +	if (--group->attach_cnt == 0)
> +		__iommu_detach_group(domain, group);
> +
> +unlock_out:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_device_shared);
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-21 16:50     ` Robin Murphy
@ 2021-12-21 18:46       ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-21 18:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:

> this proposal is the worst of both worlds, in that drivers still have to be
> just as aware of groups in order to know whether to call the _shared
> interface or not, except it's now entirely implicit and non-obvious.

Drivers are not aware of groups, where did you see that?

Drivers have to indicate their intention, based entirely on their own
internal design. If groups are present, or not is irrelevant to the
driver.

If the driver uses a single struct device (which is most) then it uses
iommu_attach_device().

If the driver uses multiple struct devices and intends to connect them
all to the same domain then it uses the _shared variant. The only
difference between the two is the _shared varient lacks some of the
protections against driver abuse of the API.

Nothing uses the group interface except for VFIO and stuff inside
drivers/iommu. VFIO has a uAPI tied to the group interface and it
is stuck with it.

> Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() -
> there's no way we want *three* attach/detach interfaces all with different
> semantics.

I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
clearly intended purposes is a lot better than one giant API with a
bunch of parameters that tries to do everything.

In this case, it is not simple to 'add the housekeeping' to
iommu_attach_group() in a way that is useful to both tegra and
VFIO. What tegra wants is what the _shared API implements, and that
logic should not be open coded in drivers.

VFIO does not want exactly that, it has its own logic to deal directly
with groups tied to its uAPI. Due to the uAPI it doesn't even have a
struct device, unfortunately.

The reason there are three APIs is because there are three different
use-cases. It is not bad thing to have APIs designed for the use cases
they serve.

> It's worth taking a step back and realising that overall, this is really
> just a more generalised and finer-grained extension of what 426a273834ea
> already did for non-group-aware code, so it makes little sense *not* to
> integrate it into the existing interfaces.

This is taking 426a to it's logical conclusion and *removing* the
group API from the drivers entirely. This is desirable because drivers
cannot do anything sane with the group.

The drivers have struct devices, and so we provide APIs that work in
terms of struct devices to cover both driver use cases today, and do
so more safely than what is already implemented.

Do not mix up VFIO with the driver interface, these are different
things. It is better VFIO stay on its own and not complicate the
driver world.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-21 18:46       ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-21 18:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:

> this proposal is the worst of both worlds, in that drivers still have to be
> just as aware of groups in order to know whether to call the _shared
> interface or not, except it's now entirely implicit and non-obvious.

Drivers are not aware of groups, where did you see that?

Drivers have to indicate their intention, based entirely on their own
internal design. If groups are present, or not is irrelevant to the
driver.

If the driver uses a single struct device (which is most) then it uses
iommu_attach_device().

If the driver uses multiple struct devices and intends to connect them
all to the same domain then it uses the _shared variant. The only
difference between the two is the _shared varient lacks some of the
protections against driver abuse of the API.

Nothing uses the group interface except for VFIO and stuff inside
drivers/iommu. VFIO has a uAPI tied to the group interface and it
is stuck with it.

> Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() -
> there's no way we want *three* attach/detach interfaces all with different
> semantics.

I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
clearly intended purposes is a lot better than one giant API with a
bunch of parameters that tries to do everything.

In this case, it is not simple to 'add the housekeeping' to
iommu_attach_group() in a way that is useful to both tegra and
VFIO. What tegra wants is what the _shared API implements, and that
logic should not be open coded in drivers.

VFIO does not want exactly that, it has its own logic to deal directly
with groups tied to its uAPI. Due to the uAPI it doesn't even have a
struct device, unfortunately.

The reason there are three APIs is because there are three different
use-cases. It is not bad thing to have APIs designed for the use cases
they serve.

> It's worth taking a step back and realising that overall, this is really
> just a more generalised and finer-grained extension of what 426a273834ea
> already did for non-group-aware code, so it makes little sense *not* to
> integrate it into the existing interfaces.

This is taking 426a to it's logical conclusion and *removing* the
group API from the drivers entirely. This is desirable because drivers
cannot do anything sane with the group.

The drivers have struct devices, and so we provide APIs that work in
terms of struct devices to cover both driver use cases today, and do
so more safely than what is already implemented.

Do not mix up VFIO with the driver interface, these are different
things. It is better VFIO stay on its own and not complicate the
driver world.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-21 18:46       ` Jason Gunthorpe via iommu
@ 2021-12-22  4:22         ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-22  4:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On 12/22/21 2:46 AM, Jason Gunthorpe wrote:
>> It's worth taking a step back and realising that overall, this is really
>> just a more generalised and finer-grained extension of what 426a273834ea
>> already did for non-group-aware code, so it makes little sense*not*  to
>> integrate it into the existing interfaces.
> This is taking 426a to it's logical conclusion and*removing*  the
> group API from the drivers entirely. This is desirable because drivers
> cannot do anything sane with the group.
> 
> The drivers have struct devices, and so we provide APIs that work in
> terms of struct devices to cover both driver use cases today, and do
> so more safely than what is already implemented.
> 
> Do not mix up VFIO with the driver interface, these are different
> things. It is better VFIO stay on its own and not complicate the
> driver world.

Per Joerg's previous comments:

https://lore.kernel.org/linux-iommu/20211119150612.jhsvsbzisvux2lga@8bytes.org/

The commit 426a273834ea came only in order to disallow attaching a
single device within a group to a different iommu_domain. So it's
reasonable to improve the existing iommu_attach/detach_device() to cover
all cases. How about below code? Did I miss anything?

int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
{
         struct iommu_group *group;
         int ret = 0;

         group = iommu_group_get(dev);
         if (!group)
                 return -ENODEV;

         mutex_lock(&group->mutex);
         if (group->attach_cnt) {
                 if (group->domain != domain) {
                         ret = -EBUSY;
                         goto unlock_out;
                 }
         } else {
                 ret = __iommu_attach_group(domain, group);
                 if (ret)
                         goto unlock_out;
         }

         group->attach_cnt++;
unlock_out:
         mutex_unlock(&group->mutex);
         iommu_group_put(group);

         return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device);

void iommu_detach_device_shared(struct iommu_domain *domain, struct 
device *dev)
{
         struct iommu_group *group;

         group = iommu_group_get(dev);
         if (WARN_ON(!group))
                 return;

         mutex_lock(&group->mutex);
         if (WARN_ON(!group->attach_cnt || group->domain != domain)
                 goto unlock_out;

         if (--group->attach_cnt == 0)
                 __iommu_detach_group(domain, group);

unlock_out:
         mutex_unlock(&group->mutex);
         iommu_group_put(group);
}
EXPORT_SYMBOL_GPL(iommu_detach_device);

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-22  4:22         ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-22  4:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On 12/22/21 2:46 AM, Jason Gunthorpe wrote:
>> It's worth taking a step back and realising that overall, this is really
>> just a more generalised and finer-grained extension of what 426a273834ea
>> already did for non-group-aware code, so it makes little sense*not*  to
>> integrate it into the existing interfaces.
> This is taking 426a to it's logical conclusion and*removing*  the
> group API from the drivers entirely. This is desirable because drivers
> cannot do anything sane with the group.
> 
> The drivers have struct devices, and so we provide APIs that work in
> terms of struct devices to cover both driver use cases today, and do
> so more safely than what is already implemented.
> 
> Do not mix up VFIO with the driver interface, these are different
> things. It is better VFIO stay on its own and not complicate the
> driver world.

Per Joerg's previous comments:

https://lore.kernel.org/linux-iommu/20211119150612.jhsvsbzisvux2lga@8bytes.org/

The commit 426a273834ea came only in order to disallow attaching a
single device within a group to a different iommu_domain. So it's
reasonable to improve the existing iommu_attach/detach_device() to cover
all cases. How about below code? Did I miss anything?

int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
{
         struct iommu_group *group;
         int ret = 0;

         group = iommu_group_get(dev);
         if (!group)
                 return -ENODEV;

         mutex_lock(&group->mutex);
         if (group->attach_cnt) {
                 if (group->domain != domain) {
                         ret = -EBUSY;
                         goto unlock_out;
                 }
         } else {
                 ret = __iommu_attach_group(domain, group);
                 if (ret)
                         goto unlock_out;
         }

         group->attach_cnt++;
unlock_out:
         mutex_unlock(&group->mutex);
         iommu_group_put(group);

         return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device);

void iommu_detach_device_shared(struct iommu_domain *domain, struct 
device *dev)
{
         struct iommu_group *group;

         group = iommu_group_get(dev);
         if (WARN_ON(!group))
                 return;

         mutex_lock(&group->mutex);
         if (WARN_ON(!group->attach_cnt || group->domain != domain)
                 goto unlock_out;

         if (--group->attach_cnt == 0)
                 __iommu_detach_group(domain, group);

unlock_out:
         mutex_unlock(&group->mutex);
         iommu_group_put(group);
}
EXPORT_SYMBOL_GPL(iommu_detach_device);

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-22  4:22         ` Lu Baolu
@ 2021-12-22  4:25           ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-22  4:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On 12/22/21 12:22 PM, Lu Baolu wrote:
> void iommu_detach_device_shared(struct iommu_domain *domain, struct 
> device *dev)

Sorry for typo. Please ignore the _shared postfix.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-22  4:25           ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-22  4:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On 12/22/21 12:22 PM, Lu Baolu wrote:
> void iommu_detach_device_shared(struct iommu_domain *domain, struct 
> device *dev)

Sorry for typo. Please ignore the _shared postfix.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-17  6:36   ` Lu Baolu
@ 2021-12-22 12:47     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2021-12-22 12:47 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Fri, Dec 17, 2021 at 02:36:57PM +0800, Lu Baolu wrote:
> This extends really_probe() to allow checking for dma ownership conflict
> during the driver binding process. By default, the DMA_OWNER_DMA_API is
> claimed for the bound driver before calling its .probe() callback. If this
> operation fails (e.g. the iommu group of the target device already has the
> DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
> security contract for devices in the iommu group.
> 
> Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
> event and then BUG_ON() in case of dma ownership conflict. This leads to
> bad user experience since careless driver binding operation may crash the
> system if the admin overlooks the group restriction. Aside from bad design,
> this leads to a security problem as a root user can force the kernel to
> BUG() even with lockdown=integrity.
> 
> Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
> claim in the binding process. Examples include kernel drivers (pci_stub,
> PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
> exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
> etc.) which need to manually claim DMA_OWNER_USER when assigning a device
> to userspace.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
> Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  include/linux/device/driver.h |  2 ++
>  drivers/base/dd.c             | 37 ++++++++++++++++++++++++++++++-----
>  2 files changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index a498ebcf4993..f5bf7030c416 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -54,6 +54,7 @@ enum probe_type {
>   * @owner:	The module owner.
>   * @mod_name:	Used for built-in modules.
>   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
> + * @suppress_auto_claim_dma_owner: Disable kernel dma auto-claim.
>   * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
>   * @of_match_table: The open firmware table.
>   * @acpi_match_table: The ACPI match table.
> @@ -100,6 +101,7 @@ struct device_driver {
>  	const char		*mod_name;	/* used for built-in modules */
>  
>  	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
> +	bool suppress_auto_claim_dma_owner;
>  	enum probe_type probe_type;
>  
>  	const struct of_device_id	*of_match_table;
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 68ea1f949daa..b04eec5dcefa 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -28,6 +28,7 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/pinctrl/devinfo.h>
>  #include <linux/slab.h>
> +#include <linux/iommu.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -538,6 +539,32 @@ static int call_driver_probe(struct device *dev, struct device_driver *drv)
>  	return ret;
>  }
>  
> +static int device_dma_configure(struct device *dev, struct device_driver *drv)
> +{
> +	int ret;
> +
> +	if (!dev->bus->dma_configure)
> +		return 0;
> +
> +	ret = dev->bus->dma_configure(dev);
> +	if (ret)
> +		return ret;
> +
> +	if (!drv->suppress_auto_claim_dma_owner)
> +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

Wait, the busses that wanted to configure the device, just did so in
their dma_configure callback, so why not do this type of
iommu_device_set_dma_owner() in the few busses that will want this to
happen?

Right now we only have 4 different "busses" that care about this.  Out
of the following callbacks:
	fsl_mc_dma_configure
	host1x_dma_configure
	pci_dma_configure
	platform_dma_configure

Which one will actually care about the iommu_device_set_dma_owner()
call?  All of them?  None of them?  Some of them?

Again, why can't this just happen in the (very few) bus callbacks that
care about this?  In following patches in this series, you turn off this
for the pci_dma_configure users, so what is left?  3 odd bus types that
are not used often.  How well did you test devices of those types with
this patchset?

It's fine to have "suppress" fields when they are the minority, but here
it's a _very_ tiny tiny number of actual devices in a system that will
ever get the chance to have this check happen for them and trigger,
right?

I know others told you to put this in the driver core, but I fail to see
how adding this call to the 3 busses that care about it is a lot more
work than this driver core functionality that we all will have to
maintain for forever?

> +
> +	return ret;
> +}
> +
> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
> +{
> +	if (!dev->bus->dma_configure)
> +		return;
> +
> +	if (!drv->suppress_auto_claim_dma_owner)
> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
> +}
> +
>  static int really_probe(struct device *dev, struct device_driver *drv)
>  {
>  	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  	if (ret)
>  		goto pinctrl_bind_failed;
>  
> -	if (dev->bus->dma_configure) {
> -		ret = dev->bus->dma_configure(dev);
> -		if (ret)
> -			goto probe_failed;
> -	}
> +	if (device_dma_configure(dev, drv))
> +		goto pinctrl_bind_failed;

Are you sure you are jumping to the proper error path here?  It is not
obvious why you changed this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-22 12:47     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2021-12-22 12:47 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 17, 2021 at 02:36:57PM +0800, Lu Baolu wrote:
> This extends really_probe() to allow checking for dma ownership conflict
> during the driver binding process. By default, the DMA_OWNER_DMA_API is
> claimed for the bound driver before calling its .probe() callback. If this
> operation fails (e.g. the iommu group of the target device already has the
> DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
> security contract for devices in the iommu group.
> 
> Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
> event and then BUG_ON() in case of dma ownership conflict. This leads to
> bad user experience since careless driver binding operation may crash the
> system if the admin overlooks the group restriction. Aside from bad design,
> this leads to a security problem as a root user can force the kernel to
> BUG() even with lockdown=integrity.
> 
> Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
> claim in the binding process. Examples include kernel drivers (pci_stub,
> PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
> exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
> etc.) which need to manually claim DMA_OWNER_USER when assigning a device
> to userspace.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
> Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  include/linux/device/driver.h |  2 ++
>  drivers/base/dd.c             | 37 ++++++++++++++++++++++++++++++-----
>  2 files changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index a498ebcf4993..f5bf7030c416 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -54,6 +54,7 @@ enum probe_type {
>   * @owner:	The module owner.
>   * @mod_name:	Used for built-in modules.
>   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
> + * @suppress_auto_claim_dma_owner: Disable kernel dma auto-claim.
>   * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
>   * @of_match_table: The open firmware table.
>   * @acpi_match_table: The ACPI match table.
> @@ -100,6 +101,7 @@ struct device_driver {
>  	const char		*mod_name;	/* used for built-in modules */
>  
>  	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
> +	bool suppress_auto_claim_dma_owner;
>  	enum probe_type probe_type;
>  
>  	const struct of_device_id	*of_match_table;
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 68ea1f949daa..b04eec5dcefa 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -28,6 +28,7 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/pinctrl/devinfo.h>
>  #include <linux/slab.h>
> +#include <linux/iommu.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -538,6 +539,32 @@ static int call_driver_probe(struct device *dev, struct device_driver *drv)
>  	return ret;
>  }
>  
> +static int device_dma_configure(struct device *dev, struct device_driver *drv)
> +{
> +	int ret;
> +
> +	if (!dev->bus->dma_configure)
> +		return 0;
> +
> +	ret = dev->bus->dma_configure(dev);
> +	if (ret)
> +		return ret;
> +
> +	if (!drv->suppress_auto_claim_dma_owner)
> +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

Wait, the busses that wanted to configure the device, just did so in
their dma_configure callback, so why not do this type of
iommu_device_set_dma_owner() in the few busses that will want this to
happen?

Right now we only have 4 different "busses" that care about this.  Out
of the following callbacks:
	fsl_mc_dma_configure
	host1x_dma_configure
	pci_dma_configure
	platform_dma_configure

Which one will actually care about the iommu_device_set_dma_owner()
call?  All of them?  None of them?  Some of them?

Again, why can't this just happen in the (very few) bus callbacks that
care about this?  In following patches in this series, you turn off this
for the pci_dma_configure users, so what is left?  3 odd bus types that
are not used often.  How well did you test devices of those types with
this patchset?

It's fine to have "suppress" fields when they are the minority, but here
it's a _very_ tiny tiny number of actual devices in a system that will
ever get the chance to have this check happen for them and trigger,
right?

I know others told you to put this in the driver core, but I fail to see
how adding this call to the 3 busses that care about it is a lot more
work than this driver core functionality that we all will have to
maintain for forever?

> +
> +	return ret;
> +}
> +
> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
> +{
> +	if (!dev->bus->dma_configure)
> +		return;
> +
> +	if (!drv->suppress_auto_claim_dma_owner)
> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
> +}
> +
>  static int really_probe(struct device *dev, struct device_driver *drv)
>  {
>  	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  	if (ret)
>  		goto pinctrl_bind_failed;
>  
> -	if (dev->bus->dma_configure) {
> -		ret = dev->bus->dma_configure(dev);
> -		if (ret)
> -			goto probe_failed;
> -	}
> +	if (device_dma_configure(dev, drv))
> +		goto pinctrl_bind_failed;

Are you sure you are jumping to the proper error path here?  It is not
obvious why you changed this.

thanks,

greg k-h
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-22 12:47     ` Greg Kroah-Hartman
@ 2021-12-22 17:52       ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-22 17:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Lu Baolu, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Wed, Dec 22, 2021 at 01:47:34PM +0100, Greg Kroah-Hartman wrote:

> Right now we only have 4 different "busses" that care about this.  Out
> of the following callbacks:
> 	fsl_mc_dma_configure
> 	host1x_dma_configure
> 	pci_dma_configure
> 	platform_dma_configure
> 
> Which one will actually care about the iommu_device_set_dma_owner()
> call?  All of them?  None of them?  Some of them?

You asked this already, and it was answered - all but host1x require
it, and it is harmless for host1x to do it.

> Again, why can't this just happen in the (very few) bus callbacks that
> care about this?  

Because it is not 'very few', it is all but one. This is why HCH and I
both prefer this arrangement.

Especially since host1x is pretty odd. I wasn't able to find where a
host1x driver is doing DMA using the host1x device.. The places I
looked at already doing DMA used a platform device. So I'm not sure
what its host1x_dma_configure is for, or why host1x calls
of_dma_configure() twice..

> In following patches in this series, you turn off this
> for the pci_dma_configure users, so what is left?  

??? Where do you see this?

> I know others told you to put this in the driver core, but I fail to see
> how adding this call to the 3 busses that care about it is a lot more
> work than this driver core functionality that we all will have to
> maintain for forever?

It is 4, you forgot AMBA's re-use of platform_dma_configure.

Why are you asking to duplicate code that has no reason to be
different based on bus type? That seems like bad practice.

No matter where we put this we have to maintain it "forever" not sure
what you are trying to say.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-22 17:52       ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-22 17:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

On Wed, Dec 22, 2021 at 01:47:34PM +0100, Greg Kroah-Hartman wrote:

> Right now we only have 4 different "busses" that care about this.  Out
> of the following callbacks:
> 	fsl_mc_dma_configure
> 	host1x_dma_configure
> 	pci_dma_configure
> 	platform_dma_configure
> 
> Which one will actually care about the iommu_device_set_dma_owner()
> call?  All of them?  None of them?  Some of them?

You asked this already, and it was answered - all but host1x require
it, and it is harmless for host1x to do it.

> Again, why can't this just happen in the (very few) bus callbacks that
> care about this?  

Because it is not 'very few', it is all but one. This is why HCH and I
both prefer this arrangement.

Especially since host1x is pretty odd. I wasn't able to find where a
host1x driver is doing DMA using the host1x device.. The places I
looked at already doing DMA used a platform device. So I'm not sure
what its host1x_dma_configure is for, or why host1x calls
of_dma_configure() twice..

> In following patches in this series, you turn off this
> for the pci_dma_configure users, so what is left?  

??? Where do you see this?

> I know others told you to put this in the driver core, but I fail to see
> how adding this call to the 3 busses that care about it is a lot more
> work than this driver core functionality that we all will have to
> maintain for forever?

It is 4, you forgot AMBA's re-use of platform_dma_configure.

Why are you asking to duplicate code that has no reason to be
different based on bus type? That seems like bad practice.

No matter where we put this we have to maintain it "forever" not sure
what you are trying to say.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-21 18:46       ` Jason Gunthorpe via iommu
@ 2021-12-22 20:26         ` Robin Murphy
  -1 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2021-12-22 20:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
> 
>> this proposal is the worst of both worlds, in that drivers still have to be
>> just as aware of groups in order to know whether to call the _shared
>> interface or not, except it's now entirely implicit and non-obvious.
> 
> Drivers are not aware of groups, where did you see that?

`git grep iommu_attach_group -- :^drivers/iommu :^include`

Did I really have to explain that?

The drivers other than vfio_iommu_type1, however, do have a complete 
failure to handle, or even consider, any group that does not fit the 
particular set of assumptions they are making, but at least they only 
work in a context where that should not occur.

> Drivers have to indicate their intention, based entirely on their own
> internal design. If groups are present, or not is irrelevant to the
> driver.
> 
> If the driver uses a single struct device (which is most) then it uses
> iommu_attach_device().
> 
> If the driver uses multiple struct devices and intends to connect them
> all to the same domain then it uses the _shared variant. The only
> difference between the two is the _shared varient lacks some of the
> protections against driver abuse of the API.

You've lost me again; how are those intentions any different? Attaching 
one device to a private domain is a literal subset of attaching more 
than one device to a private domain. There is no "abuse" of any API 
anywhere; the singleton group restriction exists as a protective measure 
because iommu_attach_device() was already in use before groups were 
really a thing, in contexts where groups happened to be singleton 
already, but anyone adding *new* uses in contexts where that assumption 
might *not* hold would be in trouble. Thus it enforces DMA ownership by 
the most trivial and heavy-handed means of simply preventing it ever 
becoming shared in the first place.

Yes, I'm using the term "DMA ownership" in a slightly different context 
to the one in which you originally proposed it. Please step out of the 
userspace-device-assignment-focused bubble for a moment and stay with me...

So then we have the iommu_attach_group() interface for new code (and 
still nobody has got round to updating the old code to it yet), for 
which the basic use-case is still fundamentally "I want to attach my 
thing to my domain", but at least now forcing explicit awareness that 
"my thing" could possibly be inextricably intertwined with more than 
just the one device they expect, so potential callers should have a good 
think about that. Unfortunately this leaves the matter of who "owns" the 
group entirely in the hands of those callers, which as we've now 
concluded is not great.

One of the main reasons for non-singleton groups to occur is due to ID 
aliasing or lack of isolation well beyond the scope and control of 
endpoint devices themselves, so it's not really fair to expect every 
IOMMU-aware driver to also be aware of that, have any idea of how to 
actually handle it, or especially try to negotiate with random other 
drivers as to whether it might be OK to take control of their DMA 
address space too. The whole point is that *every* domain attach really 
*has* to be considered "shared" because in general drivers can't know 
otherwise. Hence the easy, if crude, fix for the original API.

> Nothing uses the group interface except for VFIO and stuff inside
> drivers/iommu. VFIO has a uAPI tied to the group interface and it
> is stuck with it.

Self-contradiction is getting stronger, careful...
>> Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() -
>> there's no way we want *three* attach/detach interfaces all with different
>> semantics.
> 
> I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
> clearly intended purposes is a lot better than one giant API with a
> bunch of parameters that tries to do everything.

Because there's only one problem to solve! We have the original API 
which does happen to safely enforce ownership, but in an implicit way 
that doesn't scale; then we have the second API which got past the 
topology constraint but unfortunately turns out to just be unsafe in a 
slightly different way, and was supposed to replace the first one but 
hasn't, and is a bit clunky to boot; now you're proposing a third one 
which can correctly enforce safe ownership for any group topology, which 
is simply combining the good bits of the first two. It makes no sense to 
maintain two bad versions of a thing alongside one which works better.

I don't see why anything would be a giant API with a bunch of parameters 
- depending on how you look at it, this new proposal is basically either 
iommu_attach_device() with the ability to scale up to non-trivial groups 
properly, or iommu_attach_group() with a potentially better interface 
and actual safety. The former is still more prevalent (and the interface 
argument compelling), so if we put the new implementation behind that, 
with the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN 
automatically, kill off iommu_attach_group() by converting its couple of 
users, and not only have we solved the VFIO problem but we've also 
finally updated all the legacy code for free! Of course you can have a 
separate version for VFIO to attach with DMA_OWNER_PRIVATE_DOMAIN_USER 
if you like, although I still fail to understand the necessity of the 
distinction.

> In this case, it is not simple to 'add the housekeeping' to
> iommu_attach_group() in a way that is useful to both tegra and
> VFIO. What tegra wants is what the _shared API implements, and that
> logic should not be open coded in drivers.
> 
> VFIO does not want exactly that, it has its own logic to deal directly
> with groups tied to its uAPI. Due to the uAPI it doesn't even have a
> struct device, unfortunately.

Nope. VFIO has its own logic to deal with groups because it's the only 
thing that's ever actually tried dealing with groups correctly 
(unsurprisingly, given that it's where they came from), and every other 
private IOMMU domain user is just crippled or broken to some degree. All 
that proves is that we really should be policing groups better in the 
IOMMU core, per this series, because actually fixing all the other users 
to properly validate their device's group would be a ridiculous mess.

What VFIO wants is (conceptually[1]) "attach this device to my domain, 
provided it and any other devices in its group are managed by a driver I 
approve of." Surprise surprise, that's what any other driver wants as 
well! For iommu_attach_device() it was originally implicit, and is now 
further enforced by the singleton group restriction. For Tegra/host1x 
it's implicit in the complete obliviousness to the possibility of that 
not being the case.

Of course VFIO has a struct device if it needs one; it's trivial to 
resolve the member(s) of a group (and even more so once we can assume 
that a group may only ever contain mutually-compatible devices in the 
first place). How do you think vfio_bus_type() works?

VFIO will also need a struct device anyway, because once I get back from 
my holiday in the new year I need to start working with Simon on 
evolving the rest of the API away from bus->iommu_ops to dev->iommu so 
we can finally support IOMMU drivers coexisting[2].

> The reason there are three APIs is because there are three different
> use-cases. It is not bad thing to have APIs designed for the use cases
> they serve.

Indeed I agree with that second point, I'm just increasingly baffled how 
it's not clear to you that there is only one fundamental use-case here. 
Perhaps I'm too familiar with the history to objectively see how unclear 
the current state of things might be :/

>> It's worth taking a step back and realising that overall, this is really
>> just a more generalised and finer-grained extension of what 426a273834ea
>> already did for non-group-aware code, so it makes little sense *not* to
>> integrate it into the existing interfaces.
> 
> This is taking 426a to it's logical conclusion and *removing* the
> group API from the drivers entirely. This is desirable because drivers
> cannot do anything sane with the group.

I am in complete agreement with that (to the point of also not liking 
patch #6).

> The drivers have struct devices, and so we provide APIs that work in
> terms of struct devices to cover both driver use cases today, and do
> so more safely than what is already implemented.

I am in complete agreement with that (given "both" of the supposed 3 
use-cases all being the same).

> Do not mix up VFIO with the driver interface, these are different
> things. It is better VFIO stay on its own and not complicate the
> driver world.

Nope, vfio_iommu_type1 is just a driver, calling the IOMMU API just like 
any other driver. I like the little bit where it passes itself to 
vfio_register_iommu_driver(), which I feel gets this across far more 
poetically than I can manage.

Thanks,
Robin.

[1] Yes, due to the UAPI it actually starts with the whole group rather 
than any particular device within it. Don't nitpick.
[2] 
https://lore.kernel.org/linux-iommu/2021052710373173260118@rock-chips.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-22 20:26         ` Robin Murphy
  0 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2021-12-22 20:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
> 
>> this proposal is the worst of both worlds, in that drivers still have to be
>> just as aware of groups in order to know whether to call the _shared
>> interface or not, except it's now entirely implicit and non-obvious.
> 
> Drivers are not aware of groups, where did you see that?

`git grep iommu_attach_group -- :^drivers/iommu :^include`

Did I really have to explain that?

The drivers other than vfio_iommu_type1, however, do have a complete 
failure to handle, or even consider, any group that does not fit the 
particular set of assumptions they are making, but at least they only 
work in a context where that should not occur.

> Drivers have to indicate their intention, based entirely on their own
> internal design. If groups are present, or not is irrelevant to the
> driver.
> 
> If the driver uses a single struct device (which is most) then it uses
> iommu_attach_device().
> 
> If the driver uses multiple struct devices and intends to connect them
> all to the same domain then it uses the _shared variant. The only
> difference between the two is the _shared varient lacks some of the
> protections against driver abuse of the API.

You've lost me again; how are those intentions any different? Attaching 
one device to a private domain is a literal subset of attaching more 
than one device to a private domain. There is no "abuse" of any API 
anywhere; the singleton group restriction exists as a protective measure 
because iommu_attach_device() was already in use before groups were 
really a thing, in contexts where groups happened to be singleton 
already, but anyone adding *new* uses in contexts where that assumption 
might *not* hold would be in trouble. Thus it enforces DMA ownership by 
the most trivial and heavy-handed means of simply preventing it ever 
becoming shared in the first place.

Yes, I'm using the term "DMA ownership" in a slightly different context 
to the one in which you originally proposed it. Please step out of the 
userspace-device-assignment-focused bubble for a moment and stay with me...

So then we have the iommu_attach_group() interface for new code (and 
still nobody has got round to updating the old code to it yet), for 
which the basic use-case is still fundamentally "I want to attach my 
thing to my domain", but at least now forcing explicit awareness that 
"my thing" could possibly be inextricably intertwined with more than 
just the one device they expect, so potential callers should have a good 
think about that. Unfortunately this leaves the matter of who "owns" the 
group entirely in the hands of those callers, which as we've now 
concluded is not great.

One of the main reasons for non-singleton groups to occur is due to ID 
aliasing or lack of isolation well beyond the scope and control of 
endpoint devices themselves, so it's not really fair to expect every 
IOMMU-aware driver to also be aware of that, have any idea of how to 
actually handle it, or especially try to negotiate with random other 
drivers as to whether it might be OK to take control of their DMA 
address space too. The whole point is that *every* domain attach really 
*has* to be considered "shared" because in general drivers can't know 
otherwise. Hence the easy, if crude, fix for the original API.

> Nothing uses the group interface except for VFIO and stuff inside
> drivers/iommu. VFIO has a uAPI tied to the group interface and it
> is stuck with it.

Self-contradiction is getting stronger, careful...
>> Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() -
>> there's no way we want *three* attach/detach interfaces all with different
>> semantics.
> 
> I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
> clearly intended purposes is a lot better than one giant API with a
> bunch of parameters that tries to do everything.

Because there's only one problem to solve! We have the original API 
which does happen to safely enforce ownership, but in an implicit way 
that doesn't scale; then we have the second API which got past the 
topology constraint but unfortunately turns out to just be unsafe in a 
slightly different way, and was supposed to replace the first one but 
hasn't, and is a bit clunky to boot; now you're proposing a third one 
which can correctly enforce safe ownership for any group topology, which 
is simply combining the good bits of the first two. It makes no sense to 
maintain two bad versions of a thing alongside one which works better.

I don't see why anything would be a giant API with a bunch of parameters 
- depending on how you look at it, this new proposal is basically either 
iommu_attach_device() with the ability to scale up to non-trivial groups 
properly, or iommu_attach_group() with a potentially better interface 
and actual safety. The former is still more prevalent (and the interface 
argument compelling), so if we put the new implementation behind that, 
with the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN 
automatically, kill off iommu_attach_group() by converting its couple of 
users, and not only have we solved the VFIO problem but we've also 
finally updated all the legacy code for free! Of course you can have a 
separate version for VFIO to attach with DMA_OWNER_PRIVATE_DOMAIN_USER 
if you like, although I still fail to understand the necessity of the 
distinction.

> In this case, it is not simple to 'add the housekeeping' to
> iommu_attach_group() in a way that is useful to both tegra and
> VFIO. What tegra wants is what the _shared API implements, and that
> logic should not be open coded in drivers.
> 
> VFIO does not want exactly that, it has its own logic to deal directly
> with groups tied to its uAPI. Due to the uAPI it doesn't even have a
> struct device, unfortunately.

Nope. VFIO has its own logic to deal with groups because it's the only 
thing that's ever actually tried dealing with groups correctly 
(unsurprisingly, given that it's where they came from), and every other 
private IOMMU domain user is just crippled or broken to some degree. All 
that proves is that we really should be policing groups better in the 
IOMMU core, per this series, because actually fixing all the other users 
to properly validate their device's group would be a ridiculous mess.

What VFIO wants is (conceptually[1]) "attach this device to my domain, 
provided it and any other devices in its group are managed by a driver I 
approve of." Surprise surprise, that's what any other driver wants as 
well! For iommu_attach_device() it was originally implicit, and is now 
further enforced by the singleton group restriction. For Tegra/host1x 
it's implicit in the complete obliviousness to the possibility of that 
not being the case.

Of course VFIO has a struct device if it needs one; it's trivial to 
resolve the member(s) of a group (and even more so once we can assume 
that a group may only ever contain mutually-compatible devices in the 
first place). How do you think vfio_bus_type() works?

VFIO will also need a struct device anyway, because once I get back from 
my holiday in the new year I need to start working with Simon on 
evolving the rest of the API away from bus->iommu_ops to dev->iommu so 
we can finally support IOMMU drivers coexisting[2].

> The reason there are three APIs is because there are three different
> use-cases. It is not bad thing to have APIs designed for the use cases
> they serve.

Indeed I agree with that second point, I'm just increasingly baffled how 
it's not clear to you that there is only one fundamental use-case here. 
Perhaps I'm too familiar with the history to objectively see how unclear 
the current state of things might be :/

>> It's worth taking a step back and realising that overall, this is really
>> just a more generalised and finer-grained extension of what 426a273834ea
>> already did for non-group-aware code, so it makes little sense *not* to
>> integrate it into the existing interfaces.
> 
> This is taking 426a to it's logical conclusion and *removing* the
> group API from the drivers entirely. This is desirable because drivers
> cannot do anything sane with the group.

I am in complete agreement with that (to the point of also not liking 
patch #6).

> The drivers have struct devices, and so we provide APIs that work in
> terms of struct devices to cover both driver use cases today, and do
> so more safely than what is already implemented.

I am in complete agreement with that (given "both" of the supposed 3 
use-cases all being the same).

> Do not mix up VFIO with the driver interface, these are different
> things. It is better VFIO stay on its own and not complicate the
> driver world.

Nope, vfio_iommu_type1 is just a driver, calling the IOMMU API just like 
any other driver. I like the little bit where it passes itself to 
vfio_register_iommu_driver(), which I feel gets this across far more 
poetically than I can manage.

Thanks,
Robin.

[1] Yes, due to the UAPI it actually starts with the whole group rather 
than any particular device within it. Don't nitpick.
[2] 
https://lore.kernel.org/linux-iommu/2021052710373173260118@rock-chips.com/
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-22 20:26         ` Robin Murphy
@ 2021-12-23  0:57           ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-23  0:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
> > On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
> > 
> > > this proposal is the worst of both worlds, in that drivers still have to be
> > > just as aware of groups in order to know whether to call the _shared
> > > interface or not, except it's now entirely implicit and non-obvious.
> > 
> > Drivers are not aware of groups, where did you see that?
> 
> `git grep iommu_attach_group -- :^drivers/iommu :^include`
> 
> Did I really have to explain that?

Well, yes you did, because it shows you haven't understood my
question. After this series we deleted all those calls (though Lu, we
missed one of the tegra ones in staging, let's get it for the next
posting)

So, after this series, where do you see drivers being aware of groups?
If things are missed lets expect to fix them.

> > If the driver uses multiple struct devices and intends to connect them
> > all to the same domain then it uses the _shared variant. The only
> > difference between the two is the _shared varient lacks some of the
> > protections against driver abuse of the API.
> 
> You've lost me again; how are those intentions any different? Attaching one
> device to a private domain is a literal subset of attaching more than one
> device to a private domain. 

Yes it is a subset, but drivers will malfunction if they are not
designed to have multi-attachment and wrongly get it, and there is
only one driver that does actually need this.

I maintain a big driver subsystem and have learned that grepability of
the driver mess for special cases is quite a good thing to
have. Forcing drivers to mark in code when they do something weird is
an advantage, even if it causes some small API redundancy.

However, if you really feel strongly this should really be one API
with the _shared implementation I won't argue it any further.

> So then we have the iommu_attach_group() interface for new code (and still
> nobody has got round to updating the old code to it yet), for which
> the

This series is going in the direction of eliminating
iommu_attach_group() as part of the driver
interface. iommu_attach_group() is repurposed to only be useful for
VFIO.

> properly, or iommu_attach_group() with a potentially better interface and
> actual safety. The former is still more prevalent (and the interface
> argument compelling), so if we put the new implementation behind that, with
> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
> off iommu_attach_group() by converting its couple of users, 

This is what we did, iommu_attach_device() & _shared() are to be the
only interface for the drivers, and we killed off the
iommu_attach_group() couple of users except VFIO (the miss of
drivers/staging excepted)

> and not only have we solved the VFIO problem but we've also finally
> updated all the legacy code for free! Of course you can have a
> separate version for VFIO to attach with
> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
> understand the necessity of the distinction.

And the seperate version for VFIO is called 'iommu_attach_group()'.

Lu, it is probably a good idea to add an assertion here that the group
is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
iommu_attach_group() is only for VFIO.

VFIO has a special requirement that it be able to do:

+       ret = iommu_group_set_dma_owner(group->iommu_group,
+                                       DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);

Without having a iommu_domain to attach.

This is because of the giant special case that PPC made of VFIO's
IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
isolation that iommu_group_set_dma_owner() provides, but does not
actually have an iommu_domain and can not/does not call
iommu_attach_group().

Fixing this is a whole other giant adventure I'm hoping David will
help me unwind next year.. 

This series solves this problem by using the two step sequence of
iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
redefining how iommu_attach_group() works to require the external
caller to have done the iommu_group_set_dma_owner() for it. This is
why the series has three APIs, because the VFIO special one assumes
external iommu_group_set_dma_owner(). It just happens that is exactly
the same code as iommu_attach_group() today.

As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
an iommu_domain at this point but it still needs the iommu core to
detatch the default domain. This is what the _USER does.

Soo..

There is another way to organize this and perhaps it does make more
sense. I will try to sketch briefly in email, try to imagine the
gaps..

API family (== compares to this series):

   iommu_device_use_dma_api(dev);
     == iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

   iommu_group_set_dma_owner(group, file);
     == iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
                                   file);
     Always detaches all domains from the group

   iommu_attach_device(domain, dev)
     == as is in this patch
     dev and domain are 1:1

   iommu_attach_device_shared(domain, dev)
     == as is in this patch
     dev and domain are N:1
     * could just be the same as iommu_attach_device

   iommu_replace_group_domain(group, old_domain, new_domain)
     Makes group point at new_domain. new_domain can be NULL.

   iommu_device_unuse_dma_api(dev)
    == iommu_device_release_dma_owner() in this patch

   iommu_group_release_dma_owner(group)
    == iommu_detatch_group() && iommu_group_release_dma_owner()

VFIO would use the sequence:

   iommu_group_set_dma_owner(group, file);
   iommu_replace_group_domain(group, NULL, domain_1);
   iommu_replace_group_domain(group, domain_1, domain_2);
   iommu_group_release_dma_owner(group);

Simple devices would use

   iommu_attach_device(domain, dev);
   iommu_detatch_device(domain, dev);

Tegra would use:

   iommu_attach_device_shared(domain, dev);
   iommu_detatch_device_shared(domain, dev);
   // Or not, if people agree we should not mark this

DMA API would have the driver core dma_configure do:
   iommu_device_use_dma_api(dev);
   dev->driver->probe()
   iommu_device_unuse_dma_api(dev);

It is more APIs overall, but perhaps they have a much clearer
purpose. 

I think it would be clear why iommu_group_set_dma_owner(), which
actually does detatch, is not the same thing as iommu_attach_device().

I'm not sure if this entirely eliminates
DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
API.

Is it better?

> What VFIO wants is (conceptually[1]) "attach this device to my domain,
> provided it and any other devices in its group are managed by a driver I
> approve of." 

Yes, sure, "conceptually". But, there are troublesome details.

> VFIO will also need a struct device anyway, because once I get back from my
> holiday in the new year I need to start working with Simon on evolving the
> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
> support IOMMU drivers coexisting[2].

For VFIO it would be much easier to get the ops from the struct
iommu_group (eg via iommu_group->default_domain->ops, or whatever).

> Indeed I agree with that second point, I'm just increasingly baffled how
> it's not clear to you that there is only one fundamental use-case here.
> Perhaps I'm too familiar with the history to objectively see how unclear the
> current state of things might be :/

I think it is because you are just not familiar with the dark corners
of VFIO. 

VFIO has a special case, I outlined above.

> > This is taking 426a to it's logical conclusion and *removing* the
> > group API from the drivers entirely. This is desirable because drivers
> > cannot do anything sane with the group.
> 
> I am in complete agreement with that (to the point of also not liking patch
> #6).

Unfortunately patch #6 is only because of VFIO needing to use the
group as a handle.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-23  0:57           ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-23  0:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
> > On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
> > 
> > > this proposal is the worst of both worlds, in that drivers still have to be
> > > just as aware of groups in order to know whether to call the _shared
> > > interface or not, except it's now entirely implicit and non-obvious.
> > 
> > Drivers are not aware of groups, where did you see that?
> 
> `git grep iommu_attach_group -- :^drivers/iommu :^include`
> 
> Did I really have to explain that?

Well, yes you did, because it shows you haven't understood my
question. After this series we deleted all those calls (though Lu, we
missed one of the tegra ones in staging, let's get it for the next
posting)

So, after this series, where do you see drivers being aware of groups?
If things are missed lets expect to fix them.

> > If the driver uses multiple struct devices and intends to connect them
> > all to the same domain then it uses the _shared variant. The only
> > difference between the two is the _shared varient lacks some of the
> > protections against driver abuse of the API.
> 
> You've lost me again; how are those intentions any different? Attaching one
> device to a private domain is a literal subset of attaching more than one
> device to a private domain. 

Yes it is a subset, but drivers will malfunction if they are not
designed to have multi-attachment and wrongly get it, and there is
only one driver that does actually need this.

I maintain a big driver subsystem and have learned that grepability of
the driver mess for special cases is quite a good thing to
have. Forcing drivers to mark in code when they do something weird is
an advantage, even if it causes some small API redundancy.

However, if you really feel strongly this should really be one API
with the _shared implementation I won't argue it any further.

> So then we have the iommu_attach_group() interface for new code (and still
> nobody has got round to updating the old code to it yet), for which
> the

This series is going in the direction of eliminating
iommu_attach_group() as part of the driver
interface. iommu_attach_group() is repurposed to only be useful for
VFIO.

> properly, or iommu_attach_group() with a potentially better interface and
> actual safety. The former is still more prevalent (and the interface
> argument compelling), so if we put the new implementation behind that, with
> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
> off iommu_attach_group() by converting its couple of users, 

This is what we did, iommu_attach_device() & _shared() are to be the
only interface for the drivers, and we killed off the
iommu_attach_group() couple of users except VFIO (the miss of
drivers/staging excepted)

> and not only have we solved the VFIO problem but we've also finally
> updated all the legacy code for free! Of course you can have a
> separate version for VFIO to attach with
> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
> understand the necessity of the distinction.

And the seperate version for VFIO is called 'iommu_attach_group()'.

Lu, it is probably a good idea to add an assertion here that the group
is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
iommu_attach_group() is only for VFIO.

VFIO has a special requirement that it be able to do:

+       ret = iommu_group_set_dma_owner(group->iommu_group,
+                                       DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);

Without having a iommu_domain to attach.

This is because of the giant special case that PPC made of VFIO's
IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
isolation that iommu_group_set_dma_owner() provides, but does not
actually have an iommu_domain and can not/does not call
iommu_attach_group().

Fixing this is a whole other giant adventure I'm hoping David will
help me unwind next year.. 

This series solves this problem by using the two step sequence of
iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
redefining how iommu_attach_group() works to require the external
caller to have done the iommu_group_set_dma_owner() for it. This is
why the series has three APIs, because the VFIO special one assumes
external iommu_group_set_dma_owner(). It just happens that is exactly
the same code as iommu_attach_group() today.

As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
an iommu_domain at this point but it still needs the iommu core to
detatch the default domain. This is what the _USER does.

Soo..

There is another way to organize this and perhaps it does make more
sense. I will try to sketch briefly in email, try to imagine the
gaps..

API family (== compares to this series):

   iommu_device_use_dma_api(dev);
     == iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

   iommu_group_set_dma_owner(group, file);
     == iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
                                   file);
     Always detaches all domains from the group

   iommu_attach_device(domain, dev)
     == as is in this patch
     dev and domain are 1:1

   iommu_attach_device_shared(domain, dev)
     == as is in this patch
     dev and domain are N:1
     * could just be the same as iommu_attach_device

   iommu_replace_group_domain(group, old_domain, new_domain)
     Makes group point at new_domain. new_domain can be NULL.

   iommu_device_unuse_dma_api(dev)
    == iommu_device_release_dma_owner() in this patch

   iommu_group_release_dma_owner(group)
    == iommu_detatch_group() && iommu_group_release_dma_owner()

VFIO would use the sequence:

   iommu_group_set_dma_owner(group, file);
   iommu_replace_group_domain(group, NULL, domain_1);
   iommu_replace_group_domain(group, domain_1, domain_2);
   iommu_group_release_dma_owner(group);

Simple devices would use

   iommu_attach_device(domain, dev);
   iommu_detatch_device(domain, dev);

Tegra would use:

   iommu_attach_device_shared(domain, dev);
   iommu_detatch_device_shared(domain, dev);
   // Or not, if people agree we should not mark this

DMA API would have the driver core dma_configure do:
   iommu_device_use_dma_api(dev);
   dev->driver->probe()
   iommu_device_unuse_dma_api(dev);

It is more APIs overall, but perhaps they have a much clearer
purpose. 

I think it would be clear why iommu_group_set_dma_owner(), which
actually does detatch, is not the same thing as iommu_attach_device().

I'm not sure if this entirely eliminates
DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
API.

Is it better?

> What VFIO wants is (conceptually[1]) "attach this device to my domain,
> provided it and any other devices in its group are managed by a driver I
> approve of." 

Yes, sure, "conceptually". But, there are troublesome details.

> VFIO will also need a struct device anyway, because once I get back from my
> holiday in the new year I need to start working with Simon on evolving the
> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
> support IOMMU drivers coexisting[2].

For VFIO it would be much easier to get the ops from the struct
iommu_group (eg via iommu_group->default_domain->ops, or whatever).

> Indeed I agree with that second point, I'm just increasingly baffled how
> it's not clear to you that there is only one fundamental use-case here.
> Perhaps I'm too familiar with the history to objectively see how unclear the
> current state of things might be :/

I think it is because you are just not familiar with the dark corners
of VFIO. 

VFIO has a special case, I outlined above.

> > This is taking 426a to it's logical conclusion and *removing* the
> > group API from the drivers entirely. This is desirable because drivers
> > cannot do anything sane with the group.
> 
> I am in complete agreement with that (to the point of also not liking patch
> #6).

Unfortunately patch #6 is only because of VFIO needing to use the
group as a handle.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-22 12:47     ` Greg Kroah-Hartman
@ 2021-12-23  2:08       ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  2:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: baolu.lu, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Greg,

On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
> Which one will actually care about the iommu_device_set_dma_owner()
> call?  All of them?  None of them?  Some of them?
> 
> Again, why can't this just happen in the (very few) bus callbacks that
> care about this?  In following patches in this series, you turn off this
> for the pci_dma_configure users, so what is left?  3 odd bus types that
> are not used often.  How well did you test devices of those types with
> this patchset?
> 
> It's fine to have "suppress" fields when they are the minority, but here
> it's a_very_  tiny tiny number of actual devices in a system that will
> ever get the chance to have this check happen for them and trigger,
> right?

Thank you for your comments. Current VFIO implementation supports
devices on pci/platform/amba/fls-mc buses for user-space DMA. So only
those buses need to call iommu_device_set/release_dma_owner() in their
dma_configure/cleanup() callbacks.

The "suppress" field is only for a few device drivers (not devices), for
example,

- vfio-pci, a PCI device driver used to bind to a PCI device so that it
   could be assigned for user-space DMA.

Other similar drivers in drivers/vfio are vfio-fsl-mc, vfio-amba and
vfio-platform. These drivers will call
iommu_device_set/release_dma_owner(DMA_OWNER_USER) explicitly when the
device is assigned to user.

The logic is that on the affected buses (pci/platform/amba/fls-mc),

- for non-vfio drivers, bus dma_configure/cleanup() will automatically
   call iommu_device_set_dma_owner(KERNEL) for the device; [This is the
   majority cases.]

- for vfio drivers, the auto-call will be suppressed, and the vfio
   drivers are supposed to call iommu_device_set_dma_owner(USER) before
   device is assigned to the userspace. [This is the rare case.]

The KERNEL and USER conflict will be detected in
iommu_device_set_dma_owner() with a -EBUSY return value. In that case,
the driver binding or device assignment should be aborted.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-23  2:08       ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  2:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Greg,

On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
> Which one will actually care about the iommu_device_set_dma_owner()
> call?  All of them?  None of them?  Some of them?
> 
> Again, why can't this just happen in the (very few) bus callbacks that
> care about this?  In following patches in this series, you turn off this
> for the pci_dma_configure users, so what is left?  3 odd bus types that
> are not used often.  How well did you test devices of those types with
> this patchset?
> 
> It's fine to have "suppress" fields when they are the minority, but here
> it's a_very_  tiny tiny number of actual devices in a system that will
> ever get the chance to have this check happen for them and trigger,
> right?

Thank you for your comments. Current VFIO implementation supports
devices on pci/platform/amba/fls-mc buses for user-space DMA. So only
those buses need to call iommu_device_set/release_dma_owner() in their
dma_configure/cleanup() callbacks.

The "suppress" field is only for a few device drivers (not devices), for
example,

- vfio-pci, a PCI device driver used to bind to a PCI device so that it
   could be assigned for user-space DMA.

Other similar drivers in drivers/vfio are vfio-fsl-mc, vfio-amba and
vfio-platform. These drivers will call
iommu_device_set/release_dma_owner(DMA_OWNER_USER) explicitly when the
device is assigned to user.

The logic is that on the affected buses (pci/platform/amba/fls-mc),

- for non-vfio drivers, bus dma_configure/cleanup() will automatically
   call iommu_device_set_dma_owner(KERNEL) for the device; [This is the
   majority cases.]

- for vfio drivers, the auto-call will be suppressed, and the vfio
   drivers are supposed to call iommu_device_set_dma_owner(USER) before
   device is assigned to the userspace. [This is the rare case.]

The KERNEL and USER conflict will be detected in
iommu_device_set_dma_owner() with a -EBUSY return value. In that case,
the driver binding or device assignment should be aborted.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-22 12:47     ` Greg Kroah-Hartman
@ 2021-12-23  3:02       ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  3:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: baolu.lu, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Greg,

On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
>> +
>> +	return ret;
>> +}
>> +
>> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
>> +{
>> +	if (!dev->bus->dma_configure)
>> +		return;
>> +
>> +	if (!drv->suppress_auto_claim_dma_owner)
>> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
>> +}
>> +
>>   static int really_probe(struct device *dev, struct device_driver *drv)
>>   {
>>   	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>   	if (ret)
>>   		goto pinctrl_bind_failed;
>>   
>> -	if (dev->bus->dma_configure) {
>> -		ret = dev->bus->dma_configure(dev);
>> -		if (ret)
>> -			goto probe_failed;
>> -	}
>> +	if (device_dma_configure(dev, drv))
>> +		goto pinctrl_bind_failed;
> Are you sure you are jumping to the proper error path here?  It is not
> obvious why you changed this.

The error handling path in really_probe() seems a bit wrong. For
example,

  572         /* If using pinctrl, bind pins now before probing */
  573         ret = pinctrl_bind_pins(dev);
  574         if (ret)
  575                 goto pinctrl_bind_failed;

[...]

  663 pinctrl_bind_failed:
  664         device_links_no_driver(dev);
  665         devres_release_all(dev);
  666         arch_teardown_dma_ops(dev);
  667         kfree(dev->dma_range_map);
  668         dev->dma_range_map = NULL;
  669         driver_sysfs_remove(dev);
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  670         dev->driver = NULL;
  671         dev_set_drvdata(dev, NULL);
  672         if (dev->pm_domain && dev->pm_domain->dismiss)
  673                 dev->pm_domain->dismiss(dev);
  674         pm_runtime_reinit(dev);
  675         dev_pm_set_driver_flags(dev, 0);
  676 done:
  677         return ret;

The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
been called yet. I can fix this in a separated patch if I didn't miss
anything.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-23  3:02       ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  3:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Greg,

On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
>> +
>> +	return ret;
>> +}
>> +
>> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
>> +{
>> +	if (!dev->bus->dma_configure)
>> +		return;
>> +
>> +	if (!drv->suppress_auto_claim_dma_owner)
>> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
>> +}
>> +
>>   static int really_probe(struct device *dev, struct device_driver *drv)
>>   {
>>   	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>   	if (ret)
>>   		goto pinctrl_bind_failed;
>>   
>> -	if (dev->bus->dma_configure) {
>> -		ret = dev->bus->dma_configure(dev);
>> -		if (ret)
>> -			goto probe_failed;
>> -	}
>> +	if (device_dma_configure(dev, drv))
>> +		goto pinctrl_bind_failed;
> Are you sure you are jumping to the proper error path here?  It is not
> obvious why you changed this.

The error handling path in really_probe() seems a bit wrong. For
example,

  572         /* If using pinctrl, bind pins now before probing */
  573         ret = pinctrl_bind_pins(dev);
  574         if (ret)
  575                 goto pinctrl_bind_failed;

[...]

  663 pinctrl_bind_failed:
  664         device_links_no_driver(dev);
  665         devres_release_all(dev);
  666         arch_teardown_dma_ops(dev);
  667         kfree(dev->dma_range_map);
  668         dev->dma_range_map = NULL;
  669         driver_sysfs_remove(dev);
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  670         dev->driver = NULL;
  671         dev_set_drvdata(dev, NULL);
  672         if (dev->pm_domain && dev->pm_domain->dismiss)
  673                 dev->pm_domain->dismiss(dev);
  674         pm_runtime_reinit(dev);
  675         dev_pm_set_driver_flags(dev, 0);
  676 done:
  677         return ret;

The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
been called yet. I can fix this in a separated patch if I didn't miss
anything.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-23  0:57           ` Jason Gunthorpe via iommu
@ 2021-12-23  5:53             ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  5:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

Hi Robin and Jason,

On 12/23/21 8:57 AM, Jason Gunthorpe wrote:
> On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
>> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
>>> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
>>>
>>>> this proposal is the worst of both worlds, in that drivers still have to be
>>>> just as aware of groups in order to know whether to call the _shared
>>>> interface or not, except it's now entirely implicit and non-obvious.
>>>
>>> Drivers are not aware of groups, where did you see that?
>>
>> `git grep iommu_attach_group -- :^drivers/iommu :^include`
>>
>> Did I really have to explain that?
> 
> Well, yes you did, because it shows you haven't understood my
> question. After this series we deleted all those calls (though Lu, we
> missed one of the tegra ones in staging, let's get it for the next
> posting)

Yes, I will.

> 
> So, after this series, where do you see drivers being aware of groups?
> If things are missed lets expect to fix them.
> 
>>> If the driver uses multiple struct devices and intends to connect them
>>> all to the same domain then it uses the _shared variant. The only
>>> difference between the two is the _shared varient lacks some of the
>>> protections against driver abuse of the API.
>>
>> You've lost me again; how are those intentions any different? Attaching one
>> device to a private domain is a literal subset of attaching more than one
>> device to a private domain.
> 
> Yes it is a subset, but drivers will malfunction if they are not
> designed to have multi-attachment and wrongly get it, and there is
> only one driver that does actually need this.
> 
> I maintain a big driver subsystem and have learned that grepability of
> the driver mess for special cases is quite a good thing to
> have. Forcing drivers to mark in code when they do something weird is
> an advantage, even if it causes some small API redundancy.
> 
> However, if you really feel strongly this should really be one API
> with the _shared implementation I won't argue it any further.
> 
>> So then we have the iommu_attach_group() interface for new code (and still
>> nobody has got round to updating the old code to it yet), for which
>> the
> 
> This series is going in the direction of eliminating
> iommu_attach_group() as part of the driver
> interface. iommu_attach_group() is repurposed to only be useful for
> VFIO.

We can also remove iommu_attach_group() in VFIO because it is
essentially equivalent to

	iommu_group_for_each_dev(group, iommu_attach_device(dev))

> 
>> properly, or iommu_attach_group() with a potentially better interface and
>> actual safety. The former is still more prevalent (and the interface
>> argument compelling), so if we put the new implementation behind that, with
>> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
>> off iommu_attach_group() by converting its couple of users,
> 
> This is what we did, iommu_attach_device() & _shared() are to be the
> only interface for the drivers, and we killed off the
> iommu_attach_group() couple of users except VFIO (the miss of
> drivers/staging excepted)
> 
>> and not only have we solved the VFIO problem but we've also finally
>> updated all the legacy code for free! Of course you can have a
>> separate version for VFIO to attach with
>> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
>> understand the necessity of the distinction.
> 
> And the seperate version for VFIO is called 'iommu_attach_group()'.
> 
> Lu, it is probably a good idea to add an assertion here that the group
> is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
> iommu_attach_group() is only for VFIO.
> 
> VFIO has a special requirement that it be able to do:
> 
> +       ret = iommu_group_set_dma_owner(group->iommu_group,
> +                                       DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);
> 
> Without having a iommu_domain to attach.
> 
> This is because of the giant special case that PPC made of VFIO's
> IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
> isolation that iommu_group_set_dma_owner() provides, but does not
> actually have an iommu_domain and can not/does not call
> iommu_attach_group().
> 
> Fixing this is a whole other giant adventure I'm hoping David will
> help me unwind next year..
> 
> This series solves this problem by using the two step sequence of
> iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
> redefining how iommu_attach_group() works to require the external
> caller to have done the iommu_group_set_dma_owner() for it. This is
> why the series has three APIs, because the VFIO special one assumes
> external iommu_group_set_dma_owner(). It just happens that is exactly
> the same code as iommu_attach_group() today.
> 
> As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
> an iommu_domain at this point but it still needs the iommu core to
> detatch the default domain. This is what the _USER does.

There is also a contract that after the USER ownership is claimed the
device could be accessed by userspace through the MMIO registers. So,
a device could be accessible by userspace before a user-space I/O
address is attached.

> 
> Soo..
> 
> There is another way to organize this and perhaps it does make more
> sense. I will try to sketch briefly in email, try to imagine the
> gaps..
> 
> API family (== compares to this series):
> 
>     iommu_device_use_dma_api(dev);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);
> 
>     iommu_group_set_dma_owner(group, file);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
>                                     file);
>       Always detaches all domains from the group

I hope we can drop all group variant APIs as we already have the per-
device interfaces, just iterate all device in the group and call the
device API.

> 
>     iommu_attach_device(domain, dev)
>       == as is in this patch
>       dev and domain are 1:1
> 
>     iommu_attach_device_shared(domain, dev)
>       == as is in this patch
>       dev and domain are N:1
>       * could just be the same as iommu_attach_device
> 
>     iommu_replace_group_domain(group, old_domain, new_domain)
>       Makes group point at new_domain. new_domain can be NULL.
> 
>     iommu_device_unuse_dma_api(dev)
>      == iommu_device_release_dma_owner() in this patch
> 
>     iommu_group_release_dma_owner(group)
>      == iommu_detatch_group() && iommu_group_release_dma_owner()
> 
> VFIO would use the sequence:
> 
>     iommu_group_set_dma_owner(group, file);
>     iommu_replace_group_domain(group, NULL, domain_1);
>     iommu_replace_group_domain(group, domain_1, domain_2);
>     iommu_group_release_dma_owner(group);
> 
> Simple devices would use
> 
>     iommu_attach_device(domain, dev);
>     iommu_detatch_device(domain, dev);
> 
> Tegra would use:
> 
>     iommu_attach_device_shared(domain, dev);
>     iommu_detatch_device_shared(domain, dev);
>     // Or not, if people agree we should not mark this
> 
> DMA API would have the driver core dma_configure do:
>     iommu_device_use_dma_api(dev);
>     dev->driver->probe()
>     iommu_device_unuse_dma_api(dev);
> 
> It is more APIs overall, but perhaps they have a much clearer
> purpose.
> 
> I think it would be clear why iommu_group_set_dma_owner(), which
> actually does detatch, is not the same thing as iommu_attach_device().

iommu_device_set_dma_owner() will eventually call
iommu_group_set_dma_owner(). I didn't get why
iommu_group_set_dma_owner() is special and need to keep.

> 
> I'm not sure if this entirely eliminates
> DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
> API.
> 
> Is it better?

Perhaps I missed anything. I have a simpler idea. We only need to have
below interfaces:

	iommu_device_set_dma_owner(dev, owner);
	iommu_device_release_dma_owner(dev, owner);
	iommu_attach_device(domain, dev, owner);
	iommu_detach_device(domain, dev);

All existing drivers calling iommu_attach_device() remain unchanged
since we already have singleton group enforcement. We only need to add
a default owner type.

For multiple-device group, like drm/tegra, the drivers should claim the
PRIVATE_DOMAIN ownership and call iommu_attach_device(domain, dev,
PRIVATE_DOMAIN) explicitly.

The new iommu_attach_device(domain, dev, owner) is a mix of the existing
iommu_attach_device() and the new iommu_attach_device_shared(). That
means,
	if (group_is_singleton(group))
		__iommu_atttach_device(domain, dev)
	else
		__iommu_attach_device_shared(domain, dev, owner)

The group variant interfaces will be deprecated and replace with the
device ones.

Sorry if I missed anything.

> 
>> What VFIO wants is (conceptually[1]) "attach this device to my domain,
>> provided it and any other devices in its group are managed by a driver I
>> approve of."
> 
> Yes, sure, "conceptually". But, there are troublesome details.
> 
>> VFIO will also need a struct device anyway, because once I get back from my
>> holiday in the new year I need to start working with Simon on evolving the
>> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
>> support IOMMU drivers coexisting[2].
> 
> For VFIO it would be much easier to get the ops from the struct
> iommu_group (eg via iommu_group->default_domain->ops, or whatever).
> 
>> Indeed I agree with that second point, I'm just increasingly baffled how
>> it's not clear to you that there is only one fundamental use-case here.
>> Perhaps I'm too familiar with the history to objectively see how unclear the
>> current state of things might be :/
> 
> I think it is because you are just not familiar with the dark corners
> of VFIO.
> 
> VFIO has a special case, I outlined above.
> 
>>> This is taking 426a to it's logical conclusion and *removing* the
>>> group API from the drivers entirely. This is desirable because drivers
>>> cannot do anything sane with the group.
>>
>> I am in complete agreement with that (to the point of also not liking patch
>> #6).
> 
> Unfortunately patch #6 is only because of VFIO needing to use the
> group as a handle.
> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-23  5:53             ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  5:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

Hi Robin and Jason,

On 12/23/21 8:57 AM, Jason Gunthorpe wrote:
> On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
>> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
>>> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
>>>
>>>> this proposal is the worst of both worlds, in that drivers still have to be
>>>> just as aware of groups in order to know whether to call the _shared
>>>> interface or not, except it's now entirely implicit and non-obvious.
>>>
>>> Drivers are not aware of groups, where did you see that?
>>
>> `git grep iommu_attach_group -- :^drivers/iommu :^include`
>>
>> Did I really have to explain that?
> 
> Well, yes you did, because it shows you haven't understood my
> question. After this series we deleted all those calls (though Lu, we
> missed one of the tegra ones in staging, let's get it for the next
> posting)

Yes, I will.

> 
> So, after this series, where do you see drivers being aware of groups?
> If things are missed lets expect to fix them.
> 
>>> If the driver uses multiple struct devices and intends to connect them
>>> all to the same domain then it uses the _shared variant. The only
>>> difference between the two is the _shared varient lacks some of the
>>> protections against driver abuse of the API.
>>
>> You've lost me again; how are those intentions any different? Attaching one
>> device to a private domain is a literal subset of attaching more than one
>> device to a private domain.
> 
> Yes it is a subset, but drivers will malfunction if they are not
> designed to have multi-attachment and wrongly get it, and there is
> only one driver that does actually need this.
> 
> I maintain a big driver subsystem and have learned that grepability of
> the driver mess for special cases is quite a good thing to
> have. Forcing drivers to mark in code when they do something weird is
> an advantage, even if it causes some small API redundancy.
> 
> However, if you really feel strongly this should really be one API
> with the _shared implementation I won't argue it any further.
> 
>> So then we have the iommu_attach_group() interface for new code (and still
>> nobody has got round to updating the old code to it yet), for which
>> the
> 
> This series is going in the direction of eliminating
> iommu_attach_group() as part of the driver
> interface. iommu_attach_group() is repurposed to only be useful for
> VFIO.

We can also remove iommu_attach_group() in VFIO because it is
essentially equivalent to

	iommu_group_for_each_dev(group, iommu_attach_device(dev))

> 
>> properly, or iommu_attach_group() with a potentially better interface and
>> actual safety. The former is still more prevalent (and the interface
>> argument compelling), so if we put the new implementation behind that, with
>> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
>> off iommu_attach_group() by converting its couple of users,
> 
> This is what we did, iommu_attach_device() & _shared() are to be the
> only interface for the drivers, and we killed off the
> iommu_attach_group() couple of users except VFIO (the miss of
> drivers/staging excepted)
> 
>> and not only have we solved the VFIO problem but we've also finally
>> updated all the legacy code for free! Of course you can have a
>> separate version for VFIO to attach with
>> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
>> understand the necessity of the distinction.
> 
> And the seperate version for VFIO is called 'iommu_attach_group()'.
> 
> Lu, it is probably a good idea to add an assertion here that the group
> is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
> iommu_attach_group() is only for VFIO.
> 
> VFIO has a special requirement that it be able to do:
> 
> +       ret = iommu_group_set_dma_owner(group->iommu_group,
> +                                       DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);
> 
> Without having a iommu_domain to attach.
> 
> This is because of the giant special case that PPC made of VFIO's
> IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
> isolation that iommu_group_set_dma_owner() provides, but does not
> actually have an iommu_domain and can not/does not call
> iommu_attach_group().
> 
> Fixing this is a whole other giant adventure I'm hoping David will
> help me unwind next year..
> 
> This series solves this problem by using the two step sequence of
> iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
> redefining how iommu_attach_group() works to require the external
> caller to have done the iommu_group_set_dma_owner() for it. This is
> why the series has three APIs, because the VFIO special one assumes
> external iommu_group_set_dma_owner(). It just happens that is exactly
> the same code as iommu_attach_group() today.
> 
> As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
> an iommu_domain at this point but it still needs the iommu core to
> detatch the default domain. This is what the _USER does.

There is also a contract that after the USER ownership is claimed the
device could be accessed by userspace through the MMIO registers. So,
a device could be accessible by userspace before a user-space I/O
address is attached.

> 
> Soo..
> 
> There is another way to organize this and perhaps it does make more
> sense. I will try to sketch briefly in email, try to imagine the
> gaps..
> 
> API family (== compares to this series):
> 
>     iommu_device_use_dma_api(dev);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);
> 
>     iommu_group_set_dma_owner(group, file);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
>                                     file);
>       Always detaches all domains from the group

I hope we can drop all group variant APIs as we already have the per-
device interfaces, just iterate all device in the group and call the
device API.

> 
>     iommu_attach_device(domain, dev)
>       == as is in this patch
>       dev and domain are 1:1
> 
>     iommu_attach_device_shared(domain, dev)
>       == as is in this patch
>       dev and domain are N:1
>       * could just be the same as iommu_attach_device
> 
>     iommu_replace_group_domain(group, old_domain, new_domain)
>       Makes group point at new_domain. new_domain can be NULL.
> 
>     iommu_device_unuse_dma_api(dev)
>      == iommu_device_release_dma_owner() in this patch
> 
>     iommu_group_release_dma_owner(group)
>      == iommu_detatch_group() && iommu_group_release_dma_owner()
> 
> VFIO would use the sequence:
> 
>     iommu_group_set_dma_owner(group, file);
>     iommu_replace_group_domain(group, NULL, domain_1);
>     iommu_replace_group_domain(group, domain_1, domain_2);
>     iommu_group_release_dma_owner(group);
> 
> Simple devices would use
> 
>     iommu_attach_device(domain, dev);
>     iommu_detatch_device(domain, dev);
> 
> Tegra would use:
> 
>     iommu_attach_device_shared(domain, dev);
>     iommu_detatch_device_shared(domain, dev);
>     // Or not, if people agree we should not mark this
> 
> DMA API would have the driver core dma_configure do:
>     iommu_device_use_dma_api(dev);
>     dev->driver->probe()
>     iommu_device_unuse_dma_api(dev);
> 
> It is more APIs overall, but perhaps they have a much clearer
> purpose.
> 
> I think it would be clear why iommu_group_set_dma_owner(), which
> actually does detatch, is not the same thing as iommu_attach_device().

iommu_device_set_dma_owner() will eventually call
iommu_group_set_dma_owner(). I didn't get why
iommu_group_set_dma_owner() is special and need to keep.

> 
> I'm not sure if this entirely eliminates
> DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
> API.
> 
> Is it better?

Perhaps I missed anything. I have a simpler idea. We only need to have
below interfaces:

	iommu_device_set_dma_owner(dev, owner);
	iommu_device_release_dma_owner(dev, owner);
	iommu_attach_device(domain, dev, owner);
	iommu_detach_device(domain, dev);

All existing drivers calling iommu_attach_device() remain unchanged
since we already have singleton group enforcement. We only need to add
a default owner type.

For multiple-device group, like drm/tegra, the drivers should claim the
PRIVATE_DOMAIN ownership and call iommu_attach_device(domain, dev,
PRIVATE_DOMAIN) explicitly.

The new iommu_attach_device(domain, dev, owner) is a mix of the existing
iommu_attach_device() and the new iommu_attach_device_shared(). That
means,
	if (group_is_singleton(group))
		__iommu_atttach_device(domain, dev)
	else
		__iommu_attach_device_shared(domain, dev, owner)

The group variant interfaces will be deprecated and replace with the
device ones.

Sorry if I missed anything.

> 
>> What VFIO wants is (conceptually[1]) "attach this device to my domain,
>> provided it and any other devices in its group are managed by a driver I
>> approve of."
> 
> Yes, sure, "conceptually". But, there are troublesome details.
> 
>> VFIO will also need a struct device anyway, because once I get back from my
>> holiday in the new year I need to start working with Simon on evolving the
>> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
>> support IOMMU drivers coexisting[2].
> 
> For VFIO it would be much easier to get the ops from the struct
> iommu_group (eg via iommu_group->default_domain->ops, or whatever).
> 
>> Indeed I agree with that second point, I'm just increasingly baffled how
>> it's not clear to you that there is only one fundamental use-case here.
>> Perhaps I'm too familiar with the history to objectively see how unclear the
>> current state of things might be :/
> 
> I think it is because you are just not familiar with the dark corners
> of VFIO.
> 
> VFIO has a special case, I outlined above.
> 
>>> This is taking 426a to it's logical conclusion and *removing* the
>>> group API from the drivers entirely. This is desirable because drivers
>>> cannot do anything sane with the group.
>>
>> I am in complete agreement with that (to the point of also not liking patch
>> #6).
> 
> Unfortunately patch #6 is only because of VFIO needing to use the
> group as a handle.
> 
> Jason
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-23  3:02       ` Lu Baolu
@ 2021-12-23  7:13         ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2021-12-23  7:13 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Thu, Dec 23, 2021 at 11:02:54AM +0800, Lu Baolu wrote:
> Hi Greg,
> 
> On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
> > > +{
> > > +	if (!dev->bus->dma_configure)
> > > +		return;
> > > +
> > > +	if (!drv->suppress_auto_claim_dma_owner)
> > > +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
> > > +}
> > > +
> > >   static int really_probe(struct device *dev, struct device_driver *drv)
> > >   {
> > >   	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> > > @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >   	if (ret)
> > >   		goto pinctrl_bind_failed;
> > > -	if (dev->bus->dma_configure) {
> > > -		ret = dev->bus->dma_configure(dev);
> > > -		if (ret)
> > > -			goto probe_failed;
> > > -	}
> > > +	if (device_dma_configure(dev, drv))
> > > +		goto pinctrl_bind_failed;
> > Are you sure you are jumping to the proper error path here?  It is not
> > obvious why you changed this.
> 
> The error handling path in really_probe() seems a bit wrong. For
> example,
> 
>  572         /* If using pinctrl, bind pins now before probing */
>  573         ret = pinctrl_bind_pins(dev);
>  574         if (ret)
>  575                 goto pinctrl_bind_failed;
> 
> [...]
> 
>  663 pinctrl_bind_failed:
>  664         device_links_no_driver(dev);
>  665         devres_release_all(dev);
>  666         arch_teardown_dma_ops(dev);
>  667         kfree(dev->dma_range_map);
>  668         dev->dma_range_map = NULL;
>  669         driver_sysfs_remove(dev);
>              ^^^^^^^^^^^^^^^^^^^^^^^^^
>  670         dev->driver = NULL;
>  671         dev_set_drvdata(dev, NULL);
>  672         if (dev->pm_domain && dev->pm_domain->dismiss)
>  673                 dev->pm_domain->dismiss(dev);
>  674         pm_runtime_reinit(dev);
>  675         dev_pm_set_driver_flags(dev, 0);
>  676 done:
>  677         return ret;
> 
> The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
> been called yet. I can fix this in a separated patch if I didn't miss
> anything.

If this is a bug in the existing kernel, please submit it as a separate
patch so that it can be properly backported to all affected kernels.
Never bury it in an unrelated change that will never get sent to older
kernels.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-23  7:13         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2021-12-23  7:13 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

On Thu, Dec 23, 2021 at 11:02:54AM +0800, Lu Baolu wrote:
> Hi Greg,
> 
> On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
> > > +{
> > > +	if (!dev->bus->dma_configure)
> > > +		return;
> > > +
> > > +	if (!drv->suppress_auto_claim_dma_owner)
> > > +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
> > > +}
> > > +
> > >   static int really_probe(struct device *dev, struct device_driver *drv)
> > >   {
> > >   	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> > > @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >   	if (ret)
> > >   		goto pinctrl_bind_failed;
> > > -	if (dev->bus->dma_configure) {
> > > -		ret = dev->bus->dma_configure(dev);
> > > -		if (ret)
> > > -			goto probe_failed;
> > > -	}
> > > +	if (device_dma_configure(dev, drv))
> > > +		goto pinctrl_bind_failed;
> > Are you sure you are jumping to the proper error path here?  It is not
> > obvious why you changed this.
> 
> The error handling path in really_probe() seems a bit wrong. For
> example,
> 
>  572         /* If using pinctrl, bind pins now before probing */
>  573         ret = pinctrl_bind_pins(dev);
>  574         if (ret)
>  575                 goto pinctrl_bind_failed;
> 
> [...]
> 
>  663 pinctrl_bind_failed:
>  664         device_links_no_driver(dev);
>  665         devres_release_all(dev);
>  666         arch_teardown_dma_ops(dev);
>  667         kfree(dev->dma_range_map);
>  668         dev->dma_range_map = NULL;
>  669         driver_sysfs_remove(dev);
>              ^^^^^^^^^^^^^^^^^^^^^^^^^
>  670         dev->driver = NULL;
>  671         dev_set_drvdata(dev, NULL);
>  672         if (dev->pm_domain && dev->pm_domain->dismiss)
>  673                 dev->pm_domain->dismiss(dev);
>  674         pm_runtime_reinit(dev);
>  675         dev_pm_set_driver_flags(dev, 0);
>  676 done:
>  677         return ret;
> 
> The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
> been called yet. I can fix this in a separated patch if I didn't miss
> anything.

If this is a bug in the existing kernel, please submit it as a separate
patch so that it can be properly backported to all affected kernels.
Never bury it in an unrelated change that will never get sent to older
kernels.

greg k-h
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-23  7:13         ` Greg Kroah-Hartman
@ 2021-12-23  7:23           ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  7:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: baolu.lu, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 12/23/21 3:13 PM, Greg Kroah-Hartman wrote:
> On Thu, Dec 23, 2021 at 11:02:54AM +0800, Lu Baolu wrote:
>> Hi Greg,
>>
>> On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
>>>> +{
>>>> +	if (!dev->bus->dma_configure)
>>>> +		return;
>>>> +
>>>> +	if (!drv->suppress_auto_claim_dma_owner)
>>>> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
>>>> +}
>>>> +
>>>>    static int really_probe(struct device *dev, struct device_driver *drv)
>>>>    {
>>>>    	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>>>> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>>>    	if (ret)
>>>>    		goto pinctrl_bind_failed;
>>>> -	if (dev->bus->dma_configure) {
>>>> -		ret = dev->bus->dma_configure(dev);
>>>> -		if (ret)
>>>> -			goto probe_failed;
>>>> -	}
>>>> +	if (device_dma_configure(dev, drv))
>>>> +		goto pinctrl_bind_failed;
>>> Are you sure you are jumping to the proper error path here?  It is not
>>> obvious why you changed this.
>> The error handling path in really_probe() seems a bit wrong. For
>> example,
>>
>>   572         /* If using pinctrl, bind pins now before probing */
>>   573         ret = pinctrl_bind_pins(dev);
>>   574         if (ret)
>>   575                 goto pinctrl_bind_failed;
>>
>> [...]
>>
>>   663 pinctrl_bind_failed:
>>   664         device_links_no_driver(dev);
>>   665         devres_release_all(dev);
>>   666         arch_teardown_dma_ops(dev);
>>   667         kfree(dev->dma_range_map);
>>   668         dev->dma_range_map = NULL;
>>   669         driver_sysfs_remove(dev);
>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>   670         dev->driver = NULL;
>>   671         dev_set_drvdata(dev, NULL);
>>   672         if (dev->pm_domain && dev->pm_domain->dismiss)
>>   673                 dev->pm_domain->dismiss(dev);
>>   674         pm_runtime_reinit(dev);
>>   675         dev_pm_set_driver_flags(dev, 0);
>>   676 done:
>>   677         return ret;
>>
>> The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
>> been called yet. I can fix this in a separated patch if I didn't miss
>> anything.
> If this is a bug in the existing kernel, please submit it as a separate
> patch so that it can be properly backported to all affected kernels.
> Never bury it in an unrelated change that will never get sent to older
> kernels.

Sure! I will. Thank you!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-23  7:23           ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-23  7:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Cornelia Huck, linux-kernel, Li Yang, iommu,
	Jacob jun Pan, Daniel Vetter, Robin Murphy

On 12/23/21 3:13 PM, Greg Kroah-Hartman wrote:
> On Thu, Dec 23, 2021 at 11:02:54AM +0800, Lu Baolu wrote:
>> Hi Greg,
>>
>> On 12/22/21 8:47 PM, Greg Kroah-Hartman wrote:
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void device_dma_cleanup(struct device *dev, struct device_driver *drv)
>>>> +{
>>>> +	if (!dev->bus->dma_configure)
>>>> +		return;
>>>> +
>>>> +	if (!drv->suppress_auto_claim_dma_owner)
>>>> +		iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API);
>>>> +}
>>>> +
>>>>    static int really_probe(struct device *dev, struct device_driver *drv)
>>>>    {
>>>>    	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>>>> @@ -574,11 +601,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>>>    	if (ret)
>>>>    		goto pinctrl_bind_failed;
>>>> -	if (dev->bus->dma_configure) {
>>>> -		ret = dev->bus->dma_configure(dev);
>>>> -		if (ret)
>>>> -			goto probe_failed;
>>>> -	}
>>>> +	if (device_dma_configure(dev, drv))
>>>> +		goto pinctrl_bind_failed;
>>> Are you sure you are jumping to the proper error path here?  It is not
>>> obvious why you changed this.
>> The error handling path in really_probe() seems a bit wrong. For
>> example,
>>
>>   572         /* If using pinctrl, bind pins now before probing */
>>   573         ret = pinctrl_bind_pins(dev);
>>   574         if (ret)
>>   575                 goto pinctrl_bind_failed;
>>
>> [...]
>>
>>   663 pinctrl_bind_failed:
>>   664         device_links_no_driver(dev);
>>   665         devres_release_all(dev);
>>   666         arch_teardown_dma_ops(dev);
>>   667         kfree(dev->dma_range_map);
>>   668         dev->dma_range_map = NULL;
>>   669         driver_sysfs_remove(dev);
>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>   670         dev->driver = NULL;
>>   671         dev_set_drvdata(dev, NULL);
>>   672         if (dev->pm_domain && dev->pm_domain->dismiss)
>>   673                 dev->pm_domain->dismiss(dev);
>>   674         pm_runtime_reinit(dev);
>>   675         dev_pm_set_driver_flags(dev, 0);
>>   676 done:
>>   677         return ret;
>>
>> The driver_sysfs_remove() will be called even driver_sysfs_add() hasn't
>> been called yet. I can fix this in a separated patch if I didn't miss
>> anything.
> If this is a bug in the existing kernel, please submit it as a separate
> patch so that it can be properly backported to all affected kernels.
> Never bury it in an unrelated change that will never get sent to older
> kernels.

Sure! I will. Thank you!

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-23  5:53             ` Lu Baolu
@ 2021-12-23 14:03               ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-23 14:03 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Robin Murphy, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Thu, Dec 23, 2021 at 01:53:24PM +0800, Lu Baolu wrote:

> > This series is going in the direction of eliminating
> > iommu_attach_group() as part of the driver
> > interface. iommu_attach_group() is repurposed to only be useful for
> > VFIO.
> 
> We can also remove iommu_attach_group() in VFIO because it is
> essentially equivalent to
> 
> 	iommu_group_for_each_dev(group, iommu_attach_device(dev))

Trying to do this would be subtly buggy, remeber the group list is
dynamic so when it is time to detatch this won't reliably balance.

It is the same problem with randomly picking a device inside the group
as the groups 'handle'. There is no guarentee that will work. Only
devices from a driver should be used with the device API.

> > As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
> > an iommu_domain at this point but it still needs the iommu core to
> > detatch the default domain. This is what the _USER does.
> 
> There is also a contract that after the USER ownership is claimed the
> device could be accessed by userspace through the MMIO registers. So,
> a device could be accessible by userspace before a user-space I/O
> address is attached.

If we had an IOMMU domain we could solve this by just assigning the
correct domain. The core issue that motivates USER is the lack of an
iommu_domain.


> > I think it would be clear why iommu_group_set_dma_owner(), which
> > actually does detatch, is not the same thing as iommu_attach_device().
> 
> iommu_device_set_dma_owner() will eventually call
> iommu_group_set_dma_owner(). I didn't get why
> iommu_group_set_dma_owner() is special and need to keep.

Not quite, they would not call each other, they have different
implementations:

int iommu_device_use_dma_api(struct device *device)
{
	struct iommu_group *group = device->iommu_group;

	if (!group)
		return 0;

	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0 ||
	    group->domain != group->default_domain) {
		mutex_unlock(&group->mutex);
		return -EBUSY;
	}
	group->owner_cnt = 1;
	group->owner = NULL;
	mutex_unlock(&group->mutex);
	return 0;
}

int iommu_group_set_dma_owner(struct iommu_group *group, struct file *owner)
{
	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0) {
		if (group->owner != owner)
			goto err_unlock;
		group->owner_cnt++;
		mutex_unlock(&group->mutex);
		return 0;
	}
	if (group->domain && group->domain != group->default_domain)
		goto err_unlock;

	__iommu_detach_group(group->domain, group);
	group->owner_cnt = 1;
	group->owner = owner;
	mutex_unlock(&group->mutex);
	return 0;

err_unlock;
	mutex_unlock(&group->mutex);
	return -EBUSY;
}

It is the same as how we ended up putting the refcounting logic
directly into the iommu_attach_device().

See, we get rid of the enum as a multiplexor parameter, each API does
only wnat it needs, they don't call each other.

We don't need _USER anymore because iommu_group_set_dma_owner() always
does detatch, and iommu_replace_group_domain() avoids ever reassigning
default_domain. The sepecial USER behavior falls out automatically.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-23 14:03               ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-23 14:03 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Thu, Dec 23, 2021 at 01:53:24PM +0800, Lu Baolu wrote:

> > This series is going in the direction of eliminating
> > iommu_attach_group() as part of the driver
> > interface. iommu_attach_group() is repurposed to only be useful for
> > VFIO.
> 
> We can also remove iommu_attach_group() in VFIO because it is
> essentially equivalent to
> 
> 	iommu_group_for_each_dev(group, iommu_attach_device(dev))

Trying to do this would be subtly buggy, remeber the group list is
dynamic so when it is time to detatch this won't reliably balance.

It is the same problem with randomly picking a device inside the group
as the groups 'handle'. There is no guarentee that will work. Only
devices from a driver should be used with the device API.

> > As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
> > an iommu_domain at this point but it still needs the iommu core to
> > detatch the default domain. This is what the _USER does.
> 
> There is also a contract that after the USER ownership is claimed the
> device could be accessed by userspace through the MMIO registers. So,
> a device could be accessible by userspace before a user-space I/O
> address is attached.

If we had an IOMMU domain we could solve this by just assigning the
correct domain. The core issue that motivates USER is the lack of an
iommu_domain.


> > I think it would be clear why iommu_group_set_dma_owner(), which
> > actually does detatch, is not the same thing as iommu_attach_device().
> 
> iommu_device_set_dma_owner() will eventually call
> iommu_group_set_dma_owner(). I didn't get why
> iommu_group_set_dma_owner() is special and need to keep.

Not quite, they would not call each other, they have different
implementations:

int iommu_device_use_dma_api(struct device *device)
{
	struct iommu_group *group = device->iommu_group;

	if (!group)
		return 0;

	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0 ||
	    group->domain != group->default_domain) {
		mutex_unlock(&group->mutex);
		return -EBUSY;
	}
	group->owner_cnt = 1;
	group->owner = NULL;
	mutex_unlock(&group->mutex);
	return 0;
}

int iommu_group_set_dma_owner(struct iommu_group *group, struct file *owner)
{
	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0) {
		if (group->owner != owner)
			goto err_unlock;
		group->owner_cnt++;
		mutex_unlock(&group->mutex);
		return 0;
	}
	if (group->domain && group->domain != group->default_domain)
		goto err_unlock;

	__iommu_detach_group(group->domain, group);
	group->owner_cnt = 1;
	group->owner = owner;
	mutex_unlock(&group->mutex);
	return 0;

err_unlock;
	mutex_unlock(&group->mutex);
	return -EBUSY;
}

It is the same as how we ended up putting the refcounting logic
directly into the iommu_attach_device().

See, we get rid of the enum as a multiplexor parameter, each API does
only wnat it needs, they don't call each other.

We don't need _USER anymore because iommu_group_set_dma_owner() always
does detatch, and iommu_replace_group_domain() avoids ever reassigning
default_domain. The sepecial USER behavior falls out automatically.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-23 14:03               ` Jason Gunthorpe via iommu
@ 2021-12-24  1:30                 ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  1:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Robin Murphy, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Jason,

On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
>>> I think it would be clear why iommu_group_set_dma_owner(), which
>>> actually does detatch, is not the same thing as iommu_attach_device().
>> iommu_device_set_dma_owner() will eventually call
>> iommu_group_set_dma_owner(). I didn't get why
>> iommu_group_set_dma_owner() is special and need to keep.
> Not quite, they would not call each other, they have different
> implementations:
> 
> int iommu_device_use_dma_api(struct device *device)
> {
> 	struct iommu_group *group = device->iommu_group;
> 
> 	if (!group)
> 		return 0;
> 
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0 ||
> 	    group->domain != group->default_domain) {
> 		mutex_unlock(&group->mutex);
> 		return -EBUSY;
> 	}
> 	group->owner_cnt = 1;
> 	group->owner = NULL;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> }

It seems that this function doesn't work for multi-device groups. When
the user unbinds all native drivers from devices in the group and start
to bind them with vfio-pci and assign them to user, how could iommu know
whether the group is viable for user?

> 
> int iommu_group_set_dma_owner(struct iommu_group *group, struct file *owner)
> {
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0) {
> 		if (group->owner != owner)
> 			goto err_unlock;
> 		group->owner_cnt++;
> 		mutex_unlock(&group->mutex);
> 		return 0;
> 	}
> 	if (group->domain && group->domain != group->default_domain)
> 		goto err_unlock;
> 
> 	__iommu_detach_group(group->domain, group);
> 	group->owner_cnt = 1;
> 	group->owner = owner;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> 
> err_unlock;
> 	mutex_unlock(&group->mutex);
> 	return -EBUSY;
> }
> 
> It is the same as how we ended up putting the refcounting logic
> directly into the iommu_attach_device().
> 
> See, we get rid of the enum as a multiplexor parameter, each API does
> only wnat it needs, they don't call each other.

I like the idea of removing enum parameter and make the API name
specific. But I didn't get why they can't call each other even the
data in group is the same.

> 
> We don't need _USER anymore because iommu_group_set_dma_owner() always
> does detatch, and iommu_replace_group_domain() avoids ever reassigning
> default_domain. The sepecial USER behavior falls out automatically.

This means we will grow more group-centric interfaces. My understanding
is the opposite that we should hide the concept of group in IOMMU
subsystem, and the device drivers only faces device specific interfaces.

The iommu groups are created by the iommu subsystem. The device drivers
don't play any role in determining which device belongs to which group.
So the iommu interfaces for device driver shouldn't rely on the group.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-24  1:30                 ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  1:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Jason,

On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
>>> I think it would be clear why iommu_group_set_dma_owner(), which
>>> actually does detatch, is not the same thing as iommu_attach_device().
>> iommu_device_set_dma_owner() will eventually call
>> iommu_group_set_dma_owner(). I didn't get why
>> iommu_group_set_dma_owner() is special and need to keep.
> Not quite, they would not call each other, they have different
> implementations:
> 
> int iommu_device_use_dma_api(struct device *device)
> {
> 	struct iommu_group *group = device->iommu_group;
> 
> 	if (!group)
> 		return 0;
> 
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0 ||
> 	    group->domain != group->default_domain) {
> 		mutex_unlock(&group->mutex);
> 		return -EBUSY;
> 	}
> 	group->owner_cnt = 1;
> 	group->owner = NULL;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> }

It seems that this function doesn't work for multi-device groups. When
the user unbinds all native drivers from devices in the group and start
to bind them with vfio-pci and assign them to user, how could iommu know
whether the group is viable for user?

> 
> int iommu_group_set_dma_owner(struct iommu_group *group, struct file *owner)
> {
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0) {
> 		if (group->owner != owner)
> 			goto err_unlock;
> 		group->owner_cnt++;
> 		mutex_unlock(&group->mutex);
> 		return 0;
> 	}
> 	if (group->domain && group->domain != group->default_domain)
> 		goto err_unlock;
> 
> 	__iommu_detach_group(group->domain, group);
> 	group->owner_cnt = 1;
> 	group->owner = owner;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> 
> err_unlock;
> 	mutex_unlock(&group->mutex);
> 	return -EBUSY;
> }
> 
> It is the same as how we ended up putting the refcounting logic
> directly into the iommu_attach_device().
> 
> See, we get rid of the enum as a multiplexor parameter, each API does
> only wnat it needs, they don't call each other.

I like the idea of removing enum parameter and make the API name
specific. But I didn't get why they can't call each other even the
data in group is the same.

> 
> We don't need _USER anymore because iommu_group_set_dma_owner() always
> does detatch, and iommu_replace_group_domain() avoids ever reassigning
> default_domain. The sepecial USER behavior falls out automatically.

This means we will grow more group-centric interfaces. My understanding
is the opposite that we should hide the concept of group in IOMMU
subsystem, and the device drivers only faces device specific interfaces.

The iommu groups are created by the iommu subsystem. The device drivers
don't play any role in determining which device belongs to which group.
So the iommu interfaces for device driver shouldn't rely on the group.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-24  1:30                 ` Lu Baolu
@ 2021-12-24  2:50                   ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-24  2:50 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Robin Murphy, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Fri, Dec 24, 2021 at 09:30:17AM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
> > > > I think it would be clear why iommu_group_set_dma_owner(), which
> > > > actually does detatch, is not the same thing as iommu_attach_device().
> > > iommu_device_set_dma_owner() will eventually call
> > > iommu_group_set_dma_owner(). I didn't get why
> > > iommu_group_set_dma_owner() is special and need to keep.
> > Not quite, they would not call each other, they have different
> > implementations:
> > 
> > int iommu_device_use_dma_api(struct device *device)
> > {
> > 	struct iommu_group *group = device->iommu_group;
> > 
> > 	if (!group)
> > 		return 0;
> > 
> > 	mutex_lock(&group->mutex);
> > 	if (group->owner_cnt != 0 ||
> > 	    group->domain != group->default_domain) {
> > 		mutex_unlock(&group->mutex);
> > 		return -EBUSY;
> > 	}
> > 	group->owner_cnt = 1;
> > 	group->owner = NULL;
> > 	mutex_unlock(&group->mutex);
> > 	return 0;
> > }
> 
> It seems that this function doesn't work for multi-device groups. When
> the user unbinds all native drivers from devices in the group and start
> to bind them with vfio-pci and assign them to user, how could iommu know
> whether the group is viable for user?

It is just a mistake, I made this very fast. It should work as your
patch had it with a ++. More like this:

int iommu_device_use_dma_api(struct device *device)
{
	struct iommu_group *group = device->iommu_group;

	if (!group)
		return 0;

	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0) {
		if (group->domain != group->default_domain ||
		    group->owner != NULL) {
			mutex_unlock(&group->mutex);
			return -EBUSY;
		}
	}
	group->owner_cnt++;
	mutex_unlock(&group->mutex);
	return 0;
}

> > See, we get rid of the enum as a multiplexor parameter, each API does
> > only wnat it needs, they don't call each other.
> 
> I like the idea of removing enum parameter and make the API name
> specific. But I didn't get why they can't call each other even the
> data in group is the same.

Well, I think when you type them out you'll find they don't work the
same. Ie the iommu_group_set_dma_owner() does __iommu_detach_group()
which iommu_device_use_dma_api() definately doesn't want to
do. iommu_device_use_dma_api() checks the domain while
iommu_group_set_dma_owner() must not.

This is basically the issue, all the places touching ownercount are
superficially the same but each use different predicates. Given the
predicate is more than half the code I wouldn't try to share the rest
of it. But maybe when it is all typed in something will become
obvious?

> > We don't need _USER anymore because iommu_group_set_dma_owner() always
> > does detatch, and iommu_replace_group_domain() avoids ever reassigning
> > default_domain. The sepecial USER behavior falls out automatically.
> 
> This means we will grow more group-centric interfaces. My understanding
> is the opposite that we should hide the concept of group in IOMMU
> subsystem, and the device drivers only faces device specific interfaces.

Ideally group interfaces would be reduced, but in this case VFIO needs
the group. It has sort of a fundamental problem with its uAPI that
expects the container is fully setup with a domain at the moment the
group is attached. So deferring domain setup to when the device is
available becomes a user visible artifact - and if this is important
or not is a whole research question that isn't really that important
for this series.

We also can't just pull a device out of thin air, a device that hasn't
been probed() hasn't even had dma_configure called! Let alone the
lifetime and locking problems with that kind of idea.

So.. leaving it as a group interface makes the most sense,
particularly for this series which is really about fixing the sharing
model in the iommu core and deleting the BUG_ONs. 

Also, I'm sitting here looking at Robin's idea that
iommu_attach_device() and iommu_attach_device_shared() should be the
same - and that does seem conceptually appealing, but not so simple.

The difference is that iommu_attach_device_shared() requires the
device_driver to have set suppress_auto_claim_dma_owner while
iommu_attach_device() does not (Lu, please do add a kdoc comment
documenting this, and maybe a WARN_ON check to enforce it).

Changing all 11 drivers using iommu_attach_device() to also set
suppress_auto_claim_dma_owner is something to do in another series,
merged properly through the driver trees, if it is done at all. So
this series needs to keep both APIs.

However, what we should be doing is fixing iommu_attach_device() to
rely on the owner_cnt, and not iommu_group_device_count().

Basically it's logic should instead check for the owner_cnt == 1 and
then transform the group from a DMA_OWNER_DMA_API to a
DMA_OWNER_PRIVATE_DOMAIN. If we get rid of the enum then this happens
naturally by making group->domain != group->default_domain. All that
is missing is the owner_cnt == 1 check and some commentary.  Again
also with a WARN_ON and documentation that
suppress_auto_claim_dma_owner is not set. (TBH, I thought this was
discussed already, I haven't yet carefully checked v4..)

Then, we rely on iommu_device_use_dma_api() to block further users of
the group and remove the iommu_group_device_count() hack.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-24  2:50                   ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-24  2:50 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 24, 2021 at 09:30:17AM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
> > > > I think it would be clear why iommu_group_set_dma_owner(), which
> > > > actually does detatch, is not the same thing as iommu_attach_device().
> > > iommu_device_set_dma_owner() will eventually call
> > > iommu_group_set_dma_owner(). I didn't get why
> > > iommu_group_set_dma_owner() is special and need to keep.
> > Not quite, they would not call each other, they have different
> > implementations:
> > 
> > int iommu_device_use_dma_api(struct device *device)
> > {
> > 	struct iommu_group *group = device->iommu_group;
> > 
> > 	if (!group)
> > 		return 0;
> > 
> > 	mutex_lock(&group->mutex);
> > 	if (group->owner_cnt != 0 ||
> > 	    group->domain != group->default_domain) {
> > 		mutex_unlock(&group->mutex);
> > 		return -EBUSY;
> > 	}
> > 	group->owner_cnt = 1;
> > 	group->owner = NULL;
> > 	mutex_unlock(&group->mutex);
> > 	return 0;
> > }
> 
> It seems that this function doesn't work for multi-device groups. When
> the user unbinds all native drivers from devices in the group and start
> to bind them with vfio-pci and assign them to user, how could iommu know
> whether the group is viable for user?

It is just a mistake, I made this very fast. It should work as your
patch had it with a ++. More like this:

int iommu_device_use_dma_api(struct device *device)
{
	struct iommu_group *group = device->iommu_group;

	if (!group)
		return 0;

	mutex_lock(&group->mutex);
	if (group->owner_cnt != 0) {
		if (group->domain != group->default_domain ||
		    group->owner != NULL) {
			mutex_unlock(&group->mutex);
			return -EBUSY;
		}
	}
	group->owner_cnt++;
	mutex_unlock(&group->mutex);
	return 0;
}

> > See, we get rid of the enum as a multiplexor parameter, each API does
> > only wnat it needs, they don't call each other.
> 
> I like the idea of removing enum parameter and make the API name
> specific. But I didn't get why they can't call each other even the
> data in group is the same.

Well, I think when you type them out you'll find they don't work the
same. Ie the iommu_group_set_dma_owner() does __iommu_detach_group()
which iommu_device_use_dma_api() definately doesn't want to
do. iommu_device_use_dma_api() checks the domain while
iommu_group_set_dma_owner() must not.

This is basically the issue, all the places touching ownercount are
superficially the same but each use different predicates. Given the
predicate is more than half the code I wouldn't try to share the rest
of it. But maybe when it is all typed in something will become
obvious?

> > We don't need _USER anymore because iommu_group_set_dma_owner() always
> > does detatch, and iommu_replace_group_domain() avoids ever reassigning
> > default_domain. The sepecial USER behavior falls out automatically.
> 
> This means we will grow more group-centric interfaces. My understanding
> is the opposite that we should hide the concept of group in IOMMU
> subsystem, and the device drivers only faces device specific interfaces.

Ideally group interfaces would be reduced, but in this case VFIO needs
the group. It has sort of a fundamental problem with its uAPI that
expects the container is fully setup with a domain at the moment the
group is attached. So deferring domain setup to when the device is
available becomes a user visible artifact - and if this is important
or not is a whole research question that isn't really that important
for this series.

We also can't just pull a device out of thin air, a device that hasn't
been probed() hasn't even had dma_configure called! Let alone the
lifetime and locking problems with that kind of idea.

So.. leaving it as a group interface makes the most sense,
particularly for this series which is really about fixing the sharing
model in the iommu core and deleting the BUG_ONs. 

Also, I'm sitting here looking at Robin's idea that
iommu_attach_device() and iommu_attach_device_shared() should be the
same - and that does seem conceptually appealing, but not so simple.

The difference is that iommu_attach_device_shared() requires the
device_driver to have set suppress_auto_claim_dma_owner while
iommu_attach_device() does not (Lu, please do add a kdoc comment
documenting this, and maybe a WARN_ON check to enforce it).

Changing all 11 drivers using iommu_attach_device() to also set
suppress_auto_claim_dma_owner is something to do in another series,
merged properly through the driver trees, if it is done at all. So
this series needs to keep both APIs.

However, what we should be doing is fixing iommu_attach_device() to
rely on the owner_cnt, and not iommu_group_device_count().

Basically it's logic should instead check for the owner_cnt == 1 and
then transform the group from a DMA_OWNER_DMA_API to a
DMA_OWNER_PRIVATE_DOMAIN. If we get rid of the enum then this happens
naturally by making group->domain != group->default_domain. All that
is missing is the owner_cnt == 1 check and some commentary.  Again
also with a WARN_ON and documentation that
suppress_auto_claim_dma_owner is not set. (TBH, I thought this was
discussed already, I haven't yet carefully checked v4..)

Then, we rely on iommu_device_use_dma_api() to block further users of
the group and remove the iommu_group_device_count() hack.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-22 20:26         ` Robin Murphy
@ 2021-12-24  3:19           ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  3:19 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On 12/23/21 4:26 AM, Robin Murphy wrote:
> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
>> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
>>
>>> this proposal is the worst of both worlds, in that drivers still have 
>>> to be
>>> just as aware of groups in order to know whether to call the _shared
>>> interface or not, except it's now entirely implicit and non-obvious.
>>
>> Drivers are not aware of groups, where did you see that?
> 
> `git grep iommu_attach_group -- :^drivers/iommu :^include`
> 
> Did I really have to explain that?
> 
> The drivers other than vfio_iommu_type1, however, do have a complete 
> failure to handle, or even consider, any group that does not fit the 
> particular set of assumptions they are making, but at least they only 
> work in a context where that should not occur.
> 
>> Drivers have to indicate their intention, based entirely on their own
>> internal design. If groups are present, or not is irrelevant to the
>> driver.
>>
>> If the driver uses a single struct device (which is most) then it uses
>> iommu_attach_device().
>>
>> If the driver uses multiple struct devices and intends to connect them
>> all to the same domain then it uses the _shared variant. The only
>> difference between the two is the _shared varient lacks some of the
>> protections against driver abuse of the API.
> 
> You've lost me again; how are those intentions any different? Attaching 
> one device to a private domain is a literal subset of attaching more 
> than one device to a private domain. There is no "abuse" of any API 
> anywhere; the singleton group restriction exists as a protective measure 
> because iommu_attach_device() was already in use before groups were 
> really a thing, in contexts where groups happened to be singleton 
> already, but anyone adding *new* uses in contexts where that assumption 
> might *not* hold would be in trouble. Thus it enforces DMA ownership by 
> the most trivial and heavy-handed means of simply preventing it ever 
> becoming shared in the first place.
> 
> Yes, I'm using the term "DMA ownership" in a slightly different context 
> to the one in which you originally proposed it. Please step out of the 
> userspace-device-assignment-focused bubble for a moment and stay with me...
> 
> So then we have the iommu_attach_group() interface for new code (and 
> still nobody has got round to updating the old code to it yet), for 
> which the basic use-case is still fundamentally "I want to attach my 
> thing to my domain", but at least now forcing explicit awareness that 
> "my thing" could possibly be inextricably intertwined with more than 
> just the one device they expect, so potential callers should have a good 
> think about that. Unfortunately this leaves the matter of who "owns" the 
> group entirely in the hands of those callers, which as we've now 
> concluded is not great.
> 
> One of the main reasons for non-singleton groups to occur is due to ID 
> aliasing or lack of isolation well beyond the scope and control of 
> endpoint devices themselves, so it's not really fair to expect every 
> IOMMU-aware driver to also be aware of that, have any idea of how to 
> actually handle it, or especially try to negotiate with random other 
> drivers as to whether it might be OK to take control of their DMA 
> address space too. The whole point is that *every* domain attach really 
> *has* to be considered "shared" because in general drivers can't know 
> otherwise. Hence the easy, if crude, fix for the original API.
> 
>> Nothing uses the group interface except for VFIO and stuff inside
>> drivers/iommu. VFIO has a uAPI tied to the group interface and it
>> is stuck with it.
> 
> Self-contradiction is getting stronger, careful...
>>> Otherwise just add the housekeeping stuff to 
>>> iommu_{attach,detach}_group() -
>>> there's no way we want *three* attach/detach interfaces all with 
>>> different
>>> semantics.
>>
>> I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
>> clearly intended purposes is a lot better than one giant API with a
>> bunch of parameters that tries to do everything.
> 
> Because there's only one problem to solve! We have the original API 
> which does happen to safely enforce ownership, but in an implicit way 
> that doesn't scale; then we have the second API which got past the 
> topology constraint but unfortunately turns out to just be unsafe in a 
> slightly different way, and was supposed to replace the first one but 
> hasn't, and is a bit clunky to boot; now you're proposing a third one 
> which can correctly enforce safe ownership for any group topology, which 
> is simply combining the good bits of the first two. It makes no sense to 
> maintain two bad versions of a thing alongside one which works better.
> 
> I don't see why anything would be a giant API with a bunch of parameters 
> - depending on how you look at it, this new proposal is basically either 
> iommu_attach_device() with the ability to scale up to non-trivial groups 
> properly, or iommu_attach_group() with a potentially better interface 
> and actual safety. The former is still more prevalent (and the interface 
> argument compelling), so if we put the new implementation behind that, 
> with the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN 
> automatically, kill off iommu_attach_group() by converting its couple of 
> users, and not only have we solved the VFIO problem but we've also 
> finally updated all the legacy code for free! Of course you can have a 
> separate version for VFIO to attach with DMA_OWNER_PRIVATE_DOMAIN_USER 
> if you like, although I still fail to understand the necessity of the 
> distinction.
> 
>> In this case, it is not simple to 'add the housekeeping' to
>> iommu_attach_group() in a way that is useful to both tegra and
>> VFIO. What tegra wants is what the _shared API implements, and that
>> logic should not be open coded in drivers.
>>
>> VFIO does not want exactly that, it has its own logic to deal directly
>> with groups tied to its uAPI. Due to the uAPI it doesn't even have a
>> struct device, unfortunately.
> 
> Nope. VFIO has its own logic to deal with groups because it's the only 
> thing that's ever actually tried dealing with groups correctly 
> (unsurprisingly, given that it's where they came from), and every other 
> private IOMMU domain user is just crippled or broken to some degree. All 
> that proves is that we really should be policing groups better in the 
> IOMMU core, per this series, because actually fixing all the other users 
> to properly validate their device's group would be a ridiculous mess.
> 
> What VFIO wants is (conceptually[1]) "attach this device to my domain, 
> provided it and any other devices in its group are managed by a driver I 
> approve of." Surprise surprise, that's what any other driver wants as 
> well! For iommu_attach_device() it was originally implicit, and is now 
> further enforced by the singleton group restriction. For Tegra/host1x 
> it's implicit in the complete obliviousness to the possibility of that 
> not being the case.
> 
> Of course VFIO has a struct device if it needs one; it's trivial to 
> resolve the member(s) of a group (and even more so once we can assume 
> that a group may only ever contain mutually-compatible devices in the 
> first place). How do you think vfio_bus_type() works?
> 
> VFIO will also need a struct device anyway, because once I get back from 
> my holiday in the new year I need to start working with Simon on 
> evolving the rest of the API away from bus->iommu_ops to dev->iommu so 
> we can finally support IOMMU drivers coexisting[2].
> 
>> The reason there are three APIs is because there are three different
>> use-cases. It is not bad thing to have APIs designed for the use cases
>> they serve.
> 
> Indeed I agree with that second point, I'm just increasingly baffled how 
> it's not clear to you that there is only one fundamental use-case here. 
> Perhaps I'm too familiar with the history to objectively see how unclear 
> the current state of things might be :/
> 
>>> It's worth taking a step back and realising that overall, this is really
>>> just a more generalised and finer-grained extension of what 426a273834ea
>>> already did for non-group-aware code, so it makes little sense *not* to
>>> integrate it into the existing interfaces.
>>
>> This is taking 426a to it's logical conclusion and *removing* the
>> group API from the drivers entirely. This is desirable because drivers
>> cannot do anything sane with the group.
> 
> I am in complete agreement with that (to the point of also not liking 
> patch #6).
> 
>> The drivers have struct devices, and so we provide APIs that work in
>> terms of struct devices to cover both driver use cases today, and do
>> so more safely than what is already implemented.
> 
> I am in complete agreement with that (given "both" of the supposed 3 
> use-cases all being the same).
> 
>> Do not mix up VFIO with the driver interface, these are different
>> things. It is better VFIO stay on its own and not complicate the
>> driver world.
> 
> Nope, vfio_iommu_type1 is just a driver, calling the IOMMU API just like 
> any other driver. I like the little bit where it passes itself to 
> vfio_register_iommu_driver(), which I feel gets this across far more 
> poetically than I can manage.
> 
> Thanks,
> Robin.
> 
> [1] Yes, due to the UAPI it actually starts with the whole group rather 
> than any particular device within it. Don't nitpick.
> [2] 
> https://lore.kernel.org/linux-iommu/2021052710373173260118@rock-chips.com/

Let me summarize what I've got from above comments.

1. Essentially we only need below interfaces for device drivers to
    manage the I/O address conflict in iommu layer:

int iommu_device_set/release/query_kernel_dma(struct device *dev)

- Device driver lets the iommu layer know that driver DMAs go through
   the kernel DMA APIs. The iommu layer should use the default domain
   for DMA remapping. No other domains could be attached.
- Device driver lets the iommu layer know that driver doesn't do DMA
   anymore and other domains are allowed to be attached.
- Device driver queries "can I only do DMA through the kernel DMA API?
   In other words, can I attach my own domain?"


int iommu_device_set/release_private_dma(struct device *dev)

- Device driver lets the iommu layer know that it wants to use its own
   iommu domain. The iommu layer should detach the default domain and
   allow the driver to attach or detach its own domain through
   iommu_attach/detach_device() interfaces.
- Device driver lets the iommy layer know that it on longer needs a
   private domain.

2. iommu_attach_group() vs. iommu_attach_device()

   [HISTORY]
   The iommu_attach_device() added first by commit <fc2100eb4d096> ("add
   frontend implementation for the IOMMU API") in 2008. At that time,
   there was no concept of iommu group yet.

   The iommu group was added by commit <d72e31c937462> ("iommu: IOMMU
   Groups") four years later in 2012. The iommu_attach_group() was added
   at the same time.

   Then, people realized that iommu_attach_device() allowed different
   device in a same group to attach different domain. This was not in
   line with the concept of iommu group. The commit <426a273834eae>
   ("iommu: Limit iommu_attach/detach_device to device with their own
   group") fixed this problem in 2015.

   [REALITY]
   We have two coexisting interfaces for device drivers to do the same
   thing. But neither is perfect:

   - iommu_attach_device() only works for singleton group.
   - iommu_attach_group() asks the device drivers to handle iommu group
     related staff which is beyond the role of a device driver.

   [FUTURE]
   Considering from the perspective of a device driver, its motivation is
   very simple: "I want to manage my own I/O address space. The kernel
   DMA API is not suitable for me because it hides the I/O address space
   details in the lower layer which is transparent to me."

   We consider heading in this direction:

   Make the iommu_attach_device() the only and generic interface for the
   device drivers to use their own private domain (I/O address space)
   and replace all iommu_attach_group() uses with iommu_attach_device()
   and deprecate the former.

That's all. Did I miss or misunderstand anything?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-24  3:19           ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  3:19 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter

On 12/23/21 4:26 AM, Robin Murphy wrote:
> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
>> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
>>
>>> this proposal is the worst of both worlds, in that drivers still have 
>>> to be
>>> just as aware of groups in order to know whether to call the _shared
>>> interface or not, except it's now entirely implicit and non-obvious.
>>
>> Drivers are not aware of groups, where did you see that?
> 
> `git grep iommu_attach_group -- :^drivers/iommu :^include`
> 
> Did I really have to explain that?
> 
> The drivers other than vfio_iommu_type1, however, do have a complete 
> failure to handle, or even consider, any group that does not fit the 
> particular set of assumptions they are making, but at least they only 
> work in a context where that should not occur.
> 
>> Drivers have to indicate their intention, based entirely on their own
>> internal design. If groups are present, or not is irrelevant to the
>> driver.
>>
>> If the driver uses a single struct device (which is most) then it uses
>> iommu_attach_device().
>>
>> If the driver uses multiple struct devices and intends to connect them
>> all to the same domain then it uses the _shared variant. The only
>> difference between the two is the _shared varient lacks some of the
>> protections against driver abuse of the API.
> 
> You've lost me again; how are those intentions any different? Attaching 
> one device to a private domain is a literal subset of attaching more 
> than one device to a private domain. There is no "abuse" of any API 
> anywhere; the singleton group restriction exists as a protective measure 
> because iommu_attach_device() was already in use before groups were 
> really a thing, in contexts where groups happened to be singleton 
> already, but anyone adding *new* uses in contexts where that assumption 
> might *not* hold would be in trouble. Thus it enforces DMA ownership by 
> the most trivial and heavy-handed means of simply preventing it ever 
> becoming shared in the first place.
> 
> Yes, I'm using the term "DMA ownership" in a slightly different context 
> to the one in which you originally proposed it. Please step out of the 
> userspace-device-assignment-focused bubble for a moment and stay with me...
> 
> So then we have the iommu_attach_group() interface for new code (and 
> still nobody has got round to updating the old code to it yet), for 
> which the basic use-case is still fundamentally "I want to attach my 
> thing to my domain", but at least now forcing explicit awareness that 
> "my thing" could possibly be inextricably intertwined with more than 
> just the one device they expect, so potential callers should have a good 
> think about that. Unfortunately this leaves the matter of who "owns" the 
> group entirely in the hands of those callers, which as we've now 
> concluded is not great.
> 
> One of the main reasons for non-singleton groups to occur is due to ID 
> aliasing or lack of isolation well beyond the scope and control of 
> endpoint devices themselves, so it's not really fair to expect every 
> IOMMU-aware driver to also be aware of that, have any idea of how to 
> actually handle it, or especially try to negotiate with random other 
> drivers as to whether it might be OK to take control of their DMA 
> address space too. The whole point is that *every* domain attach really 
> *has* to be considered "shared" because in general drivers can't know 
> otherwise. Hence the easy, if crude, fix for the original API.
> 
>> Nothing uses the group interface except for VFIO and stuff inside
>> drivers/iommu. VFIO has a uAPI tied to the group interface and it
>> is stuck with it.
> 
> Self-contradiction is getting stronger, careful...
>>> Otherwise just add the housekeeping stuff to 
>>> iommu_{attach,detach}_group() -
>>> there's no way we want *three* attach/detach interfaces all with 
>>> different
>>> semantics.
>>
>> I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
>> clearly intended purposes is a lot better than one giant API with a
>> bunch of parameters that tries to do everything.
> 
> Because there's only one problem to solve! We have the original API 
> which does happen to safely enforce ownership, but in an implicit way 
> that doesn't scale; then we have the second API which got past the 
> topology constraint but unfortunately turns out to just be unsafe in a 
> slightly different way, and was supposed to replace the first one but 
> hasn't, and is a bit clunky to boot; now you're proposing a third one 
> which can correctly enforce safe ownership for any group topology, which 
> is simply combining the good bits of the first two. It makes no sense to 
> maintain two bad versions of a thing alongside one which works better.
> 
> I don't see why anything would be a giant API with a bunch of parameters 
> - depending on how you look at it, this new proposal is basically either 
> iommu_attach_device() with the ability to scale up to non-trivial groups 
> properly, or iommu_attach_group() with a potentially better interface 
> and actual safety. The former is still more prevalent (and the interface 
> argument compelling), so if we put the new implementation behind that, 
> with the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN 
> automatically, kill off iommu_attach_group() by converting its couple of 
> users, and not only have we solved the VFIO problem but we've also 
> finally updated all the legacy code for free! Of course you can have a 
> separate version for VFIO to attach with DMA_OWNER_PRIVATE_DOMAIN_USER 
> if you like, although I still fail to understand the necessity of the 
> distinction.
> 
>> In this case, it is not simple to 'add the housekeeping' to
>> iommu_attach_group() in a way that is useful to both tegra and
>> VFIO. What tegra wants is what the _shared API implements, and that
>> logic should not be open coded in drivers.
>>
>> VFIO does not want exactly that, it has its own logic to deal directly
>> with groups tied to its uAPI. Due to the uAPI it doesn't even have a
>> struct device, unfortunately.
> 
> Nope. VFIO has its own logic to deal with groups because it's the only 
> thing that's ever actually tried dealing with groups correctly 
> (unsurprisingly, given that it's where they came from), and every other 
> private IOMMU domain user is just crippled or broken to some degree. All 
> that proves is that we really should be policing groups better in the 
> IOMMU core, per this series, because actually fixing all the other users 
> to properly validate their device's group would be a ridiculous mess.
> 
> What VFIO wants is (conceptually[1]) "attach this device to my domain, 
> provided it and any other devices in its group are managed by a driver I 
> approve of." Surprise surprise, that's what any other driver wants as 
> well! For iommu_attach_device() it was originally implicit, and is now 
> further enforced by the singleton group restriction. For Tegra/host1x 
> it's implicit in the complete obliviousness to the possibility of that 
> not being the case.
> 
> Of course VFIO has a struct device if it needs one; it's trivial to 
> resolve the member(s) of a group (and even more so once we can assume 
> that a group may only ever contain mutually-compatible devices in the 
> first place). How do you think vfio_bus_type() works?
> 
> VFIO will also need a struct device anyway, because once I get back from 
> my holiday in the new year I need to start working with Simon on 
> evolving the rest of the API away from bus->iommu_ops to dev->iommu so 
> we can finally support IOMMU drivers coexisting[2].
> 
>> The reason there are three APIs is because there are three different
>> use-cases. It is not bad thing to have APIs designed for the use cases
>> they serve.
> 
> Indeed I agree with that second point, I'm just increasingly baffled how 
> it's not clear to you that there is only one fundamental use-case here. 
> Perhaps I'm too familiar with the history to objectively see how unclear 
> the current state of things might be :/
> 
>>> It's worth taking a step back and realising that overall, this is really
>>> just a more generalised and finer-grained extension of what 426a273834ea
>>> already did for non-group-aware code, so it makes little sense *not* to
>>> integrate it into the existing interfaces.
>>
>> This is taking 426a to it's logical conclusion and *removing* the
>> group API from the drivers entirely. This is desirable because drivers
>> cannot do anything sane with the group.
> 
> I am in complete agreement with that (to the point of also not liking 
> patch #6).
> 
>> The drivers have struct devices, and so we provide APIs that work in
>> terms of struct devices to cover both driver use cases today, and do
>> so more safely than what is already implemented.
> 
> I am in complete agreement with that (given "both" of the supposed 3 
> use-cases all being the same).
> 
>> Do not mix up VFIO with the driver interface, these are different
>> things. It is better VFIO stay on its own and not complicate the
>> driver world.
> 
> Nope, vfio_iommu_type1 is just a driver, calling the IOMMU API just like 
> any other driver. I like the little bit where it passes itself to 
> vfio_register_iommu_driver(), which I feel gets this across far more 
> poetically than I can manage.
> 
> Thanks,
> Robin.
> 
> [1] Yes, due to the UAPI it actually starts with the whole group rather 
> than any particular device within it. Don't nitpick.
> [2] 
> https://lore.kernel.org/linux-iommu/2021052710373173260118@rock-chips.com/

Let me summarize what I've got from above comments.

1. Essentially we only need below interfaces for device drivers to
    manage the I/O address conflict in iommu layer:

int iommu_device_set/release/query_kernel_dma(struct device *dev)

- Device driver lets the iommu layer know that driver DMAs go through
   the kernel DMA APIs. The iommu layer should use the default domain
   for DMA remapping. No other domains could be attached.
- Device driver lets the iommu layer know that driver doesn't do DMA
   anymore and other domains are allowed to be attached.
- Device driver queries "can I only do DMA through the kernel DMA API?
   In other words, can I attach my own domain?"


int iommu_device_set/release_private_dma(struct device *dev)

- Device driver lets the iommu layer know that it wants to use its own
   iommu domain. The iommu layer should detach the default domain and
   allow the driver to attach or detach its own domain through
   iommu_attach/detach_device() interfaces.
- Device driver lets the iommy layer know that it on longer needs a
   private domain.

2. iommu_attach_group() vs. iommu_attach_device()

   [HISTORY]
   The iommu_attach_device() added first by commit <fc2100eb4d096> ("add
   frontend implementation for the IOMMU API") in 2008. At that time,
   there was no concept of iommu group yet.

   The iommu group was added by commit <d72e31c937462> ("iommu: IOMMU
   Groups") four years later in 2012. The iommu_attach_group() was added
   at the same time.

   Then, people realized that iommu_attach_device() allowed different
   device in a same group to attach different domain. This was not in
   line with the concept of iommu group. The commit <426a273834eae>
   ("iommu: Limit iommu_attach/detach_device to device with their own
   group") fixed this problem in 2015.

   [REALITY]
   We have two coexisting interfaces for device drivers to do the same
   thing. But neither is perfect:

   - iommu_attach_device() only works for singleton group.
   - iommu_attach_group() asks the device drivers to handle iommu group
     related staff which is beyond the role of a device driver.

   [FUTURE]
   Considering from the perspective of a device driver, its motivation is
   very simple: "I want to manage my own I/O address space. The kernel
   DMA API is not suitable for me because it hides the I/O address space
   details in the lower layer which is transparent to me."

   We consider heading in this direction:

   Make the iommu_attach_device() the only and generic interface for the
   device drivers to use their own private domain (I/O address space)
   and replace all iommu_attach_group() uses with iommu_attach_device()
   and deprecate the former.

That's all. Did I miss or misunderstand anything?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-24  2:50                   ` Jason Gunthorpe via iommu
@ 2021-12-24  6:44                     ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  6:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Robin Murphy, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Jason,

On 2021/12/24 10:50, Jason Gunthorpe wrote:
> On Fri, Dec 24, 2021 at 09:30:17AM +0800, Lu Baolu wrote:
>> Hi Jason,
>>
>> On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
>>>>> I think it would be clear why iommu_group_set_dma_owner(), which
>>>>> actually does detatch, is not the same thing as iommu_attach_device().
>>>> iommu_device_set_dma_owner() will eventually call
>>>> iommu_group_set_dma_owner(). I didn't get why
>>>> iommu_group_set_dma_owner() is special and need to keep.
>>> Not quite, they would not call each other, they have different
>>> implementations:
>>>
>>> int iommu_device_use_dma_api(struct device *device)
>>> {
>>> 	struct iommu_group *group = device->iommu_group;
>>>
>>> 	if (!group)
>>> 		return 0;
>>>
>>> 	mutex_lock(&group->mutex);
>>> 	if (group->owner_cnt != 0 ||
>>> 	    group->domain != group->default_domain) {
>>> 		mutex_unlock(&group->mutex);
>>> 		return -EBUSY;
>>> 	}
>>> 	group->owner_cnt = 1;
>>> 	group->owner = NULL;
>>> 	mutex_unlock(&group->mutex);
>>> 	return 0;
>>> }
>> It seems that this function doesn't work for multi-device groups. When
>> the user unbinds all native drivers from devices in the group and start
>> to bind them with vfio-pci and assign them to user, how could iommu know
>> whether the group is viable for user?
> It is just a mistake, I made this very fast. It should work as your
> patch had it with a ++. More like this:
> 
> int iommu_device_use_dma_api(struct device *device)
> {
> 	struct iommu_group *group = device->iommu_group;
> 
> 	if (!group)
> 		return 0;
> 
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0) {
> 		if (group->domain != group->default_domain ||
> 		    group->owner != NULL) {
> 			mutex_unlock(&group->mutex);
> 			return -EBUSY;
> 		}
> 	}
> 	group->owner_cnt++;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> }
> 
>>> See, we get rid of the enum as a multiplexor parameter, each API does
>>> only wnat it needs, they don't call each other.
>> I like the idea of removing enum parameter and make the API name
>> specific. But I didn't get why they can't call each other even the
>> data in group is the same.
> Well, I think when you type them out you'll find they don't work the
> same. Ie the iommu_group_set_dma_owner() does __iommu_detach_group()
> which iommu_device_use_dma_api() definately doesn't want to
> do. iommu_device_use_dma_api() checks the domain while
> iommu_group_set_dma_owner() must not.
> 
> This is basically the issue, all the places touching ownercount are
> superficially the same but each use different predicates. Given the
> predicate is more than half the code I wouldn't try to share the rest
> of it. But maybe when it is all typed in something will become
> obvious?
> 

Get you and agree with you. For the remaining comments, let me wait and
listen what Robin will comment.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-24  6:44                     ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-24  6:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Jason,

On 2021/12/24 10:50, Jason Gunthorpe wrote:
> On Fri, Dec 24, 2021 at 09:30:17AM +0800, Lu Baolu wrote:
>> Hi Jason,
>>
>> On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
>>>>> I think it would be clear why iommu_group_set_dma_owner(), which
>>>>> actually does detatch, is not the same thing as iommu_attach_device().
>>>> iommu_device_set_dma_owner() will eventually call
>>>> iommu_group_set_dma_owner(). I didn't get why
>>>> iommu_group_set_dma_owner() is special and need to keep.
>>> Not quite, they would not call each other, they have different
>>> implementations:
>>>
>>> int iommu_device_use_dma_api(struct device *device)
>>> {
>>> 	struct iommu_group *group = device->iommu_group;
>>>
>>> 	if (!group)
>>> 		return 0;
>>>
>>> 	mutex_lock(&group->mutex);
>>> 	if (group->owner_cnt != 0 ||
>>> 	    group->domain != group->default_domain) {
>>> 		mutex_unlock(&group->mutex);
>>> 		return -EBUSY;
>>> 	}
>>> 	group->owner_cnt = 1;
>>> 	group->owner = NULL;
>>> 	mutex_unlock(&group->mutex);
>>> 	return 0;
>>> }
>> It seems that this function doesn't work for multi-device groups. When
>> the user unbinds all native drivers from devices in the group and start
>> to bind them with vfio-pci and assign them to user, how could iommu know
>> whether the group is viable for user?
> It is just a mistake, I made this very fast. It should work as your
> patch had it with a ++. More like this:
> 
> int iommu_device_use_dma_api(struct device *device)
> {
> 	struct iommu_group *group = device->iommu_group;
> 
> 	if (!group)
> 		return 0;
> 
> 	mutex_lock(&group->mutex);
> 	if (group->owner_cnt != 0) {
> 		if (group->domain != group->default_domain ||
> 		    group->owner != NULL) {
> 			mutex_unlock(&group->mutex);
> 			return -EBUSY;
> 		}
> 	}
> 	group->owner_cnt++;
> 	mutex_unlock(&group->mutex);
> 	return 0;
> }
> 
>>> See, we get rid of the enum as a multiplexor parameter, each API does
>>> only wnat it needs, they don't call each other.
>> I like the idea of removing enum parameter and make the API name
>> specific. But I didn't get why they can't call each other even the
>> data in group is the same.
> Well, I think when you type them out you'll find they don't work the
> same. Ie the iommu_group_set_dma_owner() does __iommu_detach_group()
> which iommu_device_use_dma_api() definately doesn't want to
> do. iommu_device_use_dma_api() checks the domain while
> iommu_group_set_dma_owner() must not.
> 
> This is basically the issue, all the places touching ownercount are
> superficially the same but each use different predicates. Given the
> predicate is more than half the code I wouldn't try to share the rest
> of it. But maybe when it is all typed in something will become
> obvious?
> 

Get you and agree with you. For the remaining comments, let me wait and
listen what Robin will comment.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-24  3:19           ` Lu Baolu
@ 2021-12-24 14:24             ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-24 14:24 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Robin Murphy, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Fri, Dec 24, 2021 at 11:19:44AM +0800, Lu Baolu wrote:

> Let me summarize what I've got from above comments.
> 
> 1. Essentially we only need below interfaces for device drivers to
>    manage the I/O address conflict in iommu layer:
> 
> int iommu_device_set/release/query_kernel_dma(struct device *dev)
> 
> - Device driver lets the iommu layer know that driver DMAs go through
>   the kernel DMA APIs. The iommu layer should use the default domain
>   for DMA remapping. No other domains could be attached.
> - Device driver lets the iommu layer know that driver doesn't do DMA
>   anymore and other domains are allowed to be attached.
> - Device driver queries "can I only do DMA through the kernel DMA API?
>   In other words, can I attach my own domain?"

I'm not sure I see the utility of a query, but OK - this is the API
family v4 has added to really_probe, basically.

> int iommu_device_set/release_private_dma(struct device *dev)
> 
> - Device driver lets the iommu layer know that it wants to use its own
>   iommu domain. The iommu layer should detach the default domain and
>   allow the driver to attach or detach its own domain through
>   iommu_attach/detach_device() interfaces.
> - Device driver lets the iommy layer know that it on longer needs a
>   private domain.

Drivers don't actually need an interface like this, they all have
domains so they can all present their domain when they want to change
the ownership mode.

The advantage of presenting the domain in the API is that it allows
the core code to support sharing. Present the same domain and your
device gets to join the group. Present a different domain and it is
rejected. Simple.

Since there is no domain the above APIs cannot support tegra, for
instance.

>   Make the iommu_attach_device() the only and generic interface for the
>   device drivers to use their own private domain (I/O address space)
>   and replace all iommu_attach_group() uses with iommu_attach_device()
>   and deprecate the former.

Certainly in the devices drivers yes, VFIO should stay with group as
I've explained.

Ideals aside, we still need to have this series to have a scope that
is achievable in a reasonable size. So, we still end up with three
interfaces:

 1) iommu_attach_device() as used by the 11 current drivers that do
    not set suppress_auto_claim_dma_owner.
    It's key property is that it is API compatible with what we have
    today and doesn't require changing the 11 drivers.

 2) iommu_attach_device_shared() which is used by tegra and requires
    that drivers set suppress_auto_claim_dma_owner.

    A followup series could replace all calls of iommu_attach_device()
    with iommu_attach_device_shared() with one patch per driver that
    also sets suppress_auto_claim_dma_owner.

 3) Unless a better idea aries the
    iommu_group_set_dma_owner()/iommu_replace_group_domain()
    API that I suggested, used only by VFIO. This API is designed to
    work without a domain and uses the 'struct file *owner' instead
    of the domain to permit sharing. It swaps the obviously confusing
    concept of _USER for the more general concept of 'replace domain'.

All three need to consistently use the owner_cnt and related to
implement their internal logic.

It is a pretty clear explanation why there are three interfaces.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2021-12-24 14:24             ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-24 14:24 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 24, 2021 at 11:19:44AM +0800, Lu Baolu wrote:

> Let me summarize what I've got from above comments.
> 
> 1. Essentially we only need below interfaces for device drivers to
>    manage the I/O address conflict in iommu layer:
> 
> int iommu_device_set/release/query_kernel_dma(struct device *dev)
> 
> - Device driver lets the iommu layer know that driver DMAs go through
>   the kernel DMA APIs. The iommu layer should use the default domain
>   for DMA remapping. No other domains could be attached.
> - Device driver lets the iommu layer know that driver doesn't do DMA
>   anymore and other domains are allowed to be attached.
> - Device driver queries "can I only do DMA through the kernel DMA API?
>   In other words, can I attach my own domain?"

I'm not sure I see the utility of a query, but OK - this is the API
family v4 has added to really_probe, basically.

> int iommu_device_set/release_private_dma(struct device *dev)
> 
> - Device driver lets the iommu layer know that it wants to use its own
>   iommu domain. The iommu layer should detach the default domain and
>   allow the driver to attach or detach its own domain through
>   iommu_attach/detach_device() interfaces.
> - Device driver lets the iommy layer know that it on longer needs a
>   private domain.

Drivers don't actually need an interface like this, they all have
domains so they can all present their domain when they want to change
the ownership mode.

The advantage of presenting the domain in the API is that it allows
the core code to support sharing. Present the same domain and your
device gets to join the group. Present a different domain and it is
rejected. Simple.

Since there is no domain the above APIs cannot support tegra, for
instance.

>   Make the iommu_attach_device() the only and generic interface for the
>   device drivers to use their own private domain (I/O address space)
>   and replace all iommu_attach_group() uses with iommu_attach_device()
>   and deprecate the former.

Certainly in the devices drivers yes, VFIO should stay with group as
I've explained.

Ideals aside, we still need to have this series to have a scope that
is achievable in a reasonable size. So, we still end up with three
interfaces:

 1) iommu_attach_device() as used by the 11 current drivers that do
    not set suppress_auto_claim_dma_owner.
    It's key property is that it is API compatible with what we have
    today and doesn't require changing the 11 drivers.

 2) iommu_attach_device_shared() which is used by tegra and requires
    that drivers set suppress_auto_claim_dma_owner.

    A followup series could replace all calls of iommu_attach_device()
    with iommu_attach_device_shared() with one patch per driver that
    also sets suppress_auto_claim_dma_owner.

 3) Unless a better idea aries the
    iommu_group_set_dma_owner()/iommu_replace_group_domain()
    API that I suggested, used only by VFIO. This API is designed to
    work without a domain and uses the 'struct file *owner' instead
    of the domain to permit sharing. It swaps the obviously confusing
    concept of _USER for the more general concept of 'replace domain'.

All three need to consistently use the owner_cnt and related to
implement their internal logic.

It is a pretty clear explanation why there are three interfaces.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-17  6:36   ` Lu Baolu
@ 2021-12-29 20:42     ` Bjorn Helgaas
  -1 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-29 20:42 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA.

I'm looking at pci_dma_configure(), and I don't see the connection to
iommu_groups.

> Avoid this default behavior for the
> pci_stub because it does not program any DMA itself.  This allows the
> pci_stub still able to be used by the admin to block driver binding after
> applying the DMA ownership to vfio.

> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,

The new .suppress_auto_claim_dma_owner controls whether we call
iommu_device_set_dma_owner().  I guess you added
.suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
must be done *before* we call the driver's .probe() method?

Otherwise, we could call some new interface from .probe() instead of
adding the flag to struct device_driver.

> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-29 20:42     ` Bjorn Helgaas
  0 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-29 20:42 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA.

I'm looking at pci_dma_configure(), and I don't see the connection to
iommu_groups.

> Avoid this default behavior for the
> pci_stub because it does not program any DMA itself.  This allows the
> pci_stub still able to be used by the admin to block driver binding after
> applying the DMA ownership to vfio.

> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,

The new .suppress_auto_claim_dma_owner controls whether we call
iommu_device_set_dma_owner().  I guess you added
.suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
must be done *before* we call the driver's .probe() method?

Otherwise, we could call some new interface from .probe() instead of
adding the flag to struct device_driver.

> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-12-17  6:36   ` Lu Baolu
@ 2021-12-29 21:16     ` Bjorn Helgaas
  -1 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-29 21:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge. The existing vfio framework allows the portdrv driver to
> be bound to the bridge while its downstream devices are assigned to user
> space. The pci_dma_configure() marks the iommu_group as containing only
> devices with kernel drivers that manage DMA. Avoid this default behavior
> for the portdrv driver in order for compatibility with the current vfio
> policy.

A word about the isolation would be useful.  I think you're referring
to some specific ACS controls, probably P2P Request Redirect?

I guess this is just a wording issue, but I think it's actually the
*lack* of some ACS controls that forces us to put several devices in
the same IOMMU group, isn't it?  It's not that we start with "IOMMU
grouping" and that necessitates something else.

Maybe something like this?

  If a switch lacks ACS P2P Request Redirect (and possibly other
  controls?), a device below the switch can bypass the IOMMU and DMA
  directly to other devices below the switch, so all the downstream
  devices must be in the same IOMMU group as the switch itself.

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
> policy to all kernel drivers of bridge class. This is not always safe.
> For example, The shpchp_core driver relies on the PCI MMIO access for the
> controller functionality. With its downstream devices assigned to the
> userspace, the MMIO might be changed through user initiated P2P accesses
> without any notification. This might break the kernel driver integrity
> and lead to some unpredictable consequences.
> 
> For any bridge driver, in order to avoiding default kernel DMA ownership
> claiming, we should consider:
> 
>  1) Does the bridge driver use DMA? Calling pci_set_master() or
>     a dma_map_* API is a sure indicate the driver is doing DMA
> 
>  2) If the bridge driver uses MMIO, is it tolerant to hostile
>     userspace also touching the same MMIO registers via P2P DMA
>     attacks?
> 
> Conservatively if the driver maps an MMIO region at all, we can say that
> it fails the test.

I'm not sure what all this explanation is telling me.  It says
something done by 5f096b14d421 is not always safe, but this patch
doesn't fix any of those unsafe things.

If it doesn't explain why we need this patch or how this patch works,
I don't think we need it in the commit log.

Maybe this is an explanation for why you didn't set
.suppress_auto_claim_dma_owner for shpc_driver?

Minor typos above:
  s/in order to avoiding default/before avoiding default/
  s/relies on the PCI MMIO access/relies on PCI MMIO access/
  s/For example, The/For example, the/
  s/is a sure indicate the/is a sure indication the/

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..c48a8734f9c4 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
> +	.driver		= {
> +		.pm = PCIE_PORTDRV_PM_OPS,
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
@ 2021-12-29 21:16     ` Bjorn Helgaas
  0 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-29 21:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge. The existing vfio framework allows the portdrv driver to
> be bound to the bridge while its downstream devices are assigned to user
> space. The pci_dma_configure() marks the iommu_group as containing only
> devices with kernel drivers that manage DMA. Avoid this default behavior
> for the portdrv driver in order for compatibility with the current vfio
> policy.

A word about the isolation would be useful.  I think you're referring
to some specific ACS controls, probably P2P Request Redirect?

I guess this is just a wording issue, but I think it's actually the
*lack* of some ACS controls that forces us to put several devices in
the same IOMMU group, isn't it?  It's not that we start with "IOMMU
grouping" and that necessitates something else.

Maybe something like this?

  If a switch lacks ACS P2P Request Redirect (and possibly other
  controls?), a device below the switch can bypass the IOMMU and DMA
  directly to other devices below the switch, so all the downstream
  devices must be in the same IOMMU group as the switch itself.

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
> policy to all kernel drivers of bridge class. This is not always safe.
> For example, The shpchp_core driver relies on the PCI MMIO access for the
> controller functionality. With its downstream devices assigned to the
> userspace, the MMIO might be changed through user initiated P2P accesses
> without any notification. This might break the kernel driver integrity
> and lead to some unpredictable consequences.
> 
> For any bridge driver, in order to avoiding default kernel DMA ownership
> claiming, we should consider:
> 
>  1) Does the bridge driver use DMA? Calling pci_set_master() or
>     a dma_map_* API is a sure indicate the driver is doing DMA
> 
>  2) If the bridge driver uses MMIO, is it tolerant to hostile
>     userspace also touching the same MMIO registers via P2P DMA
>     attacks?
> 
> Conservatively if the driver maps an MMIO region at all, we can say that
> it fails the test.

I'm not sure what all this explanation is telling me.  It says
something done by 5f096b14d421 is not always safe, but this patch
doesn't fix any of those unsafe things.

If it doesn't explain why we need this patch or how this patch works,
I don't think we need it in the commit log.

Maybe this is an explanation for why you didn't set
.suppress_auto_claim_dma_owner for shpc_driver?

Minor typos above:
  s/in order to avoiding default/before avoiding default/
  s/relies on the PCI MMIO access/relies on PCI MMIO access/
  s/For example, The/For example, the/
  s/is a sure indicate the/is a sure indication the/

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..c48a8734f9c4 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
> +	.driver		= {
> +		.pm = PCIE_PORTDRV_PM_OPS,
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-29 20:42     ` Bjorn Helgaas
@ 2021-12-30  5:34       ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-30  5:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Bjorn,

On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
> On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
>> The pci_dma_configure() marks the iommu_group as containing only devices
>> with kernel drivers that manage DMA.
> 
> I'm looking at pci_dma_configure(), and I don't see the connection to
> iommu_groups.

The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
sets all drivers' DMA to be kernel-managed by default except a few ones
which has a driver flag set. So by default, all iommu groups contains
only devices with kernel drivers managing DMA.

> 
>> Avoid this default behavior for the
>> pci_stub because it does not program any DMA itself.  This allows the
>> pci_stub still able to be used by the admin to block driver binding after
>> applying the DMA ownership to vfio.
> 
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pci-stub.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
>> index e408099fea52..6324c68602b4 100644
>> --- a/drivers/pci/pci-stub.c
>> +++ b/drivers/pci/pci-stub.c
>> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>>   	.name		= "pci-stub",
>>   	.id_table	= NULL,	/* only dynamic id's */
>>   	.probe		= pci_stub_probe,
>> +	.driver		= {
>> +		.suppress_auto_claim_dma_owner = true,
> 
> The new .suppress_auto_claim_dma_owner controls whether we call
> iommu_device_set_dma_owner().  I guess you added
> .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
> must be done *before* we call the driver's .probe() method?

As explained above, all drivers are set to kernel-managed dma by
default. For those vfio and vfio-approved drivers,
suppress_auto_claim_dma_owner is used to tell the driver core that "this
driver is attached to device for userspace assignment purpose, do not
claim it for kernel-management dma".

> 
> Otherwise, we could call some new interface from .probe() instead of
> adding the flag to struct device_driver.

Most device drivers are of the kernel-managed DMA type. Only a few vfio
and vfio-approved drivers need to use this flag. That's the reason why
we claim kernel-managed DMA by default.

> 
>> +	},
>>   };
>>   
>>   static int __init pci_stub_init(void)
>> -- 
>> 2.25.1
>>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-30  5:34       ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-30  5:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Bjorn,

On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
> On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
>> The pci_dma_configure() marks the iommu_group as containing only devices
>> with kernel drivers that manage DMA.
> 
> I'm looking at pci_dma_configure(), and I don't see the connection to
> iommu_groups.

The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
sets all drivers' DMA to be kernel-managed by default except a few ones
which has a driver flag set. So by default, all iommu groups contains
only devices with kernel drivers managing DMA.

> 
>> Avoid this default behavior for the
>> pci_stub because it does not program any DMA itself.  This allows the
>> pci_stub still able to be used by the admin to block driver binding after
>> applying the DMA ownership to vfio.
> 
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pci-stub.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
>> index e408099fea52..6324c68602b4 100644
>> --- a/drivers/pci/pci-stub.c
>> +++ b/drivers/pci/pci-stub.c
>> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>>   	.name		= "pci-stub",
>>   	.id_table	= NULL,	/* only dynamic id's */
>>   	.probe		= pci_stub_probe,
>> +	.driver		= {
>> +		.suppress_auto_claim_dma_owner = true,
> 
> The new .suppress_auto_claim_dma_owner controls whether we call
> iommu_device_set_dma_owner().  I guess you added
> .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
> must be done *before* we call the driver's .probe() method?

As explained above, all drivers are set to kernel-managed dma by
default. For those vfio and vfio-approved drivers,
suppress_auto_claim_dma_owner is used to tell the driver core that "this
driver is attached to device for userspace assignment purpose, do not
claim it for kernel-management dma".

> 
> Otherwise, we could call some new interface from .probe() instead of
> adding the flag to struct device_driver.

Most device drivers are of the kernel-managed DMA type. Only a few vfio
and vfio-approved drivers need to use this flag. That's the reason why
we claim kernel-managed DMA by default.

> 
>> +	},
>>   };
>>   
>>   static int __init pci_stub_init(void)
>> -- 
>> 2.25.1
>>

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-12-29 21:16     ` Bjorn Helgaas
@ 2021-12-30  5:49       ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-30  5:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Bjorn,

On 12/30/21 5:16 AM, Bjorn Helgaas wrote:
> On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
>> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
>> then all of the downstream devices will be part of the same IOMMU group
>> as the bridge. The existing vfio framework allows the portdrv driver to
>> be bound to the bridge while its downstream devices are assigned to user
>> space. The pci_dma_configure() marks the iommu_group as containing only
>> devices with kernel drivers that manage DMA. Avoid this default behavior
>> for the portdrv driver in order for compatibility with the current vfio
>> policy.
> 
> A word about the isolation would be useful.  I think you're referring
> to some specific ACS controls, probably P2P Request Redirect?
> 
> I guess this is just a wording issue, but I think it's actually the
> *lack* of some ACS controls that forces us to put several devices in
> the same IOMMU group, isn't it?  It's not that we start with "IOMMU
> grouping" and that necessitates something else.
> 
> Maybe something like this?
> 
>    If a switch lacks ACS P2P Request Redirect (and possibly other
>    controls?), a device below the switch can bypass the IOMMU and DMA
>    directly to other devices below the switch, so all the downstream
>    devices must be in the same IOMMU group as the switch itself.

Yes. That's what it means from the perspective of PCI/PCIe. I will use
this in the next version. Thanks!

> 
>> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
>> policy to all kernel drivers of bridge class. This is not always safe.
>> For example, The shpchp_core driver relies on the PCI MMIO access for the
>> controller functionality. With its downstream devices assigned to the
>> userspace, the MMIO might be changed through user initiated P2P accesses
>> without any notification. This might break the kernel driver integrity
>> and lead to some unpredictable consequences.
>>
>> For any bridge driver, in order to avoiding default kernel DMA ownership
>> claiming, we should consider:
>>
>>   1) Does the bridge driver use DMA? Calling pci_set_master() or
>>      a dma_map_* API is a sure indicate the driver is doing DMA
>>
>>   2) If the bridge driver uses MMIO, is it tolerant to hostile
>>      userspace also touching the same MMIO registers via P2P DMA
>>      attacks?
>>
>> Conservatively if the driver maps an MMIO region at all, we can say that
>> it fails the test.
> 
> I'm not sure what all this explanation is telling me.  It says
> something done by 5f096b14d421 is not always safe, but this patch
> doesn't fix any of those unsafe things.
> 
> If it doesn't explain why we need this patch or how this patch works,
> I don't think we need it in the commit log.
> 
> Maybe this is an explanation for why you didn't set
> .suppress_auto_claim_dma_owner for shpc_driver?

You are right. This doesn't explain why this is needed and how it works.
It only explains why we don't do the same thing to other pci port
drivers. I will move this out of the commit message. Perhaps put it
in the cover letter or some patches for vifo.

> 
> Minor typos above:
>    s/in order to avoiding default/before avoiding default/
>    s/relies on the PCI MMIO access/relies on PCI MMIO access/
>    s/For example, The/For example, the/
>    s/is a sure indicate the/is a sure indication the/

Thank you! I will correct these.

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..c48a8734f9c4 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>>   
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>> +	.driver		= {
>> +		.pm = PCIE_PORTDRV_PM_OPS,
>> +		.suppress_auto_claim_dma_owner = true,
>> +	},
>>   };
>>   
>>   static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
>> -- 
>> 2.25.1
>>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
@ 2021-12-30  5:49       ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-30  5:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Bjorn,

On 12/30/21 5:16 AM, Bjorn Helgaas wrote:
> On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
>> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
>> then all of the downstream devices will be part of the same IOMMU group
>> as the bridge. The existing vfio framework allows the portdrv driver to
>> be bound to the bridge while its downstream devices are assigned to user
>> space. The pci_dma_configure() marks the iommu_group as containing only
>> devices with kernel drivers that manage DMA. Avoid this default behavior
>> for the portdrv driver in order for compatibility with the current vfio
>> policy.
> 
> A word about the isolation would be useful.  I think you're referring
> to some specific ACS controls, probably P2P Request Redirect?
> 
> I guess this is just a wording issue, but I think it's actually the
> *lack* of some ACS controls that forces us to put several devices in
> the same IOMMU group, isn't it?  It's not that we start with "IOMMU
> grouping" and that necessitates something else.
> 
> Maybe something like this?
> 
>    If a switch lacks ACS P2P Request Redirect (and possibly other
>    controls?), a device below the switch can bypass the IOMMU and DMA
>    directly to other devices below the switch, so all the downstream
>    devices must be in the same IOMMU group as the switch itself.

Yes. That's what it means from the perspective of PCI/PCIe. I will use
this in the next version. Thanks!

> 
>> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
>> policy to all kernel drivers of bridge class. This is not always safe.
>> For example, The shpchp_core driver relies on the PCI MMIO access for the
>> controller functionality. With its downstream devices assigned to the
>> userspace, the MMIO might be changed through user initiated P2P accesses
>> without any notification. This might break the kernel driver integrity
>> and lead to some unpredictable consequences.
>>
>> For any bridge driver, in order to avoiding default kernel DMA ownership
>> claiming, we should consider:
>>
>>   1) Does the bridge driver use DMA? Calling pci_set_master() or
>>      a dma_map_* API is a sure indicate the driver is doing DMA
>>
>>   2) If the bridge driver uses MMIO, is it tolerant to hostile
>>      userspace also touching the same MMIO registers via P2P DMA
>>      attacks?
>>
>> Conservatively if the driver maps an MMIO region at all, we can say that
>> it fails the test.
> 
> I'm not sure what all this explanation is telling me.  It says
> something done by 5f096b14d421 is not always safe, but this patch
> doesn't fix any of those unsafe things.
> 
> If it doesn't explain why we need this patch or how this patch works,
> I don't think we need it in the commit log.
> 
> Maybe this is an explanation for why you didn't set
> .suppress_auto_claim_dma_owner for shpc_driver?

You are right. This doesn't explain why this is needed and how it works.
It only explains why we don't do the same thing to other pci port
drivers. I will move this out of the commit message. Perhaps put it
in the cover letter or some patches for vifo.

> 
> Minor typos above:
>    s/in order to avoiding default/before avoiding default/
>    s/relies on the PCI MMIO access/relies on PCI MMIO access/
>    s/For example, The/For example, the/
>    s/is a sure indicate the/is a sure indication the/

Thank you! I will correct these.

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..c48a8734f9c4 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>>   
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>> +	.driver		= {
>> +		.pm = PCIE_PORTDRV_PM_OPS,
>> +		.suppress_auto_claim_dma_owner = true,
>> +	},
>>   };
>>   
>>   static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
>> -- 
>> 2.25.1
>>

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-30  5:34       ` Lu Baolu
@ 2021-12-30 22:24         ` Bjorn Helgaas
  -1 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-30 22:24 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On Thu, Dec 30, 2021 at 01:34:27PM +0800, Lu Baolu wrote:
> Hi Bjorn,
> 
> On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
> > On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> > > The pci_dma_configure() marks the iommu_group as containing only devices
> > > with kernel drivers that manage DMA.
> > 
> > I'm looking at pci_dma_configure(), and I don't see the connection to
> > iommu_groups.
> 
> The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
> sets all drivers' DMA to be kernel-managed by default except a few ones
> which has a driver flag set. So by default, all iommu groups contains
> only devices with kernel drivers managing DMA.

It looks like that happens in device_dma_configure(), not
pci_dma_configure().

> > > Avoid this default behavior for the
> > > pci_stub because it does not program any DMA itself.  This allows the
> > > pci_stub still able to be used by the admin to block driver binding after
> > > applying the DMA ownership to vfio.
> > 
> > > 
> > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > > ---
> > >   drivers/pci/pci-stub.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> > > index e408099fea52..6324c68602b4 100644
> > > --- a/drivers/pci/pci-stub.c
> > > +++ b/drivers/pci/pci-stub.c
> > > @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
> > >   	.name		= "pci-stub",
> > >   	.id_table	= NULL,	/* only dynamic id's */
> > >   	.probe		= pci_stub_probe,
> > > +	.driver		= {
> > > +		.suppress_auto_claim_dma_owner = true,
> > 
> > The new .suppress_auto_claim_dma_owner controls whether we call
> > iommu_device_set_dma_owner().  I guess you added
> > .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
> > must be done *before* we call the driver's .probe() method?
> 
> As explained above, all drivers are set to kernel-managed dma by
> default. For those vfio and vfio-approved drivers,
> suppress_auto_claim_dma_owner is used to tell the driver core that "this
> driver is attached to device for userspace assignment purpose, do not
> claim it for kernel-management dma".
> 
> > Otherwise, we could call some new interface from .probe() instead of
> > adding the flag to struct device_driver.
> 
> Most device drivers are of the kernel-managed DMA type. Only a few vfio
> and vfio-approved drivers need to use this flag. That's the reason why
> we claim kernel-managed DMA by default.

Yes.  But you didn't answer the question of whether this must be done
by a new flag in struct device_driver, or whether it could be done by
having these few VFIO and "VFIO-approved" (whatever that means)
drivers call a new interface.

I was speculating that maybe the DMA ownership claiming must be done
*before* the driver's .probe() method?  If so, that would require a
new flag.  But I don't know whether that's the case.  If DMA
ownership could be claimed by the .probe() method, we wouldn't need
the new flag in struct device_driver.

Bjorn

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-30 22:24         ` Bjorn Helgaas
  0 siblings, 0 replies; 94+ messages in thread
From: Bjorn Helgaas @ 2021-12-30 22:24 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Thu, Dec 30, 2021 at 01:34:27PM +0800, Lu Baolu wrote:
> Hi Bjorn,
> 
> On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
> > On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> > > The pci_dma_configure() marks the iommu_group as containing only devices
> > > with kernel drivers that manage DMA.
> > 
> > I'm looking at pci_dma_configure(), and I don't see the connection to
> > iommu_groups.
> 
> The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
> sets all drivers' DMA to be kernel-managed by default except a few ones
> which has a driver flag set. So by default, all iommu groups contains
> only devices with kernel drivers managing DMA.

It looks like that happens in device_dma_configure(), not
pci_dma_configure().

> > > Avoid this default behavior for the
> > > pci_stub because it does not program any DMA itself.  This allows the
> > > pci_stub still able to be used by the admin to block driver binding after
> > > applying the DMA ownership to vfio.
> > 
> > > 
> > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > > ---
> > >   drivers/pci/pci-stub.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> > > index e408099fea52..6324c68602b4 100644
> > > --- a/drivers/pci/pci-stub.c
> > > +++ b/drivers/pci/pci-stub.c
> > > @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
> > >   	.name		= "pci-stub",
> > >   	.id_table	= NULL,	/* only dynamic id's */
> > >   	.probe		= pci_stub_probe,
> > > +	.driver		= {
> > > +		.suppress_auto_claim_dma_owner = true,
> > 
> > The new .suppress_auto_claim_dma_owner controls whether we call
> > iommu_device_set_dma_owner().  I guess you added
> > .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
> > must be done *before* we call the driver's .probe() method?
> 
> As explained above, all drivers are set to kernel-managed dma by
> default. For those vfio and vfio-approved drivers,
> suppress_auto_claim_dma_owner is used to tell the driver core that "this
> driver is attached to device for userspace assignment purpose, do not
> claim it for kernel-management dma".
> 
> > Otherwise, we could call some new interface from .probe() instead of
> > adding the flag to struct device_driver.
> 
> Most device drivers are of the kernel-managed DMA type. Only a few vfio
> and vfio-approved drivers need to use this flag. That's the reason why
> we claim kernel-managed DMA by default.

Yes.  But you didn't answer the question of whether this must be done
by a new flag in struct device_driver, or whether it could be done by
having these few VFIO and "VFIO-approved" (whatever that means)
drivers call a new interface.

I was speculating that maybe the DMA ownership claiming must be done
*before* the driver's .probe() method?  If so, that would require a
new flag.  But I don't know whether that's the case.  If DMA
ownership could be claimed by the .probe() method, we wouldn't need
the new flag in struct device_driver.

Bjorn
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
  2021-12-23  7:23           ` Lu Baolu
@ 2021-12-31  0:36             ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-31  0:36 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Christoph Hellwig, Kevin Tian, Ashok Raj, Will Deacon,
	Robin Murphy, Dan Williams, rafael, Diana Craciun, Cornelia Huck,
	Eric Auger, Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni,
	Stuart Yoder, Laurentiu Tudor, Thierry Reding, David Airlie,
	Daniel Vetter, Jonathan Hunter, Li Yang, Dmitry Osipenko, iommu,
	linux-pci, kvm, linux-kernel

On Thu, Dec 23, 2021 at 03:23:54PM +0800, Lu Baolu wrote:

> > If this is a bug in the existing kernel, please submit it as a separate
> > patch so that it can be properly backported to all affected kernels.
> > Never bury it in an unrelated change that will never get sent to older
> > kernels.
> 
> Sure! I will. Thank you!

I recall looking at this some time ago, and yes the ordering is not in
the strict pairwise error unwind one would expect, the extra calls
turned out to be harmless. Do check carefully..

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind
@ 2021-12-31  0:36             ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-31  0:36 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Thu, Dec 23, 2021 at 03:23:54PM +0800, Lu Baolu wrote:

> > If this is a bug in the existing kernel, please submit it as a separate
> > patch so that it can be properly backported to all affected kernels.
> > Never bury it in an unrelated change that will never get sent to older
> > kernels.
> 
> Sure! I will. Thank you!

I recall looking at this some time ago, and yes the ordering is not in
the strict pairwise error unwind one would expect, the extra calls
turned out to be harmless. Do check carefully..

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-30 22:24         ` Bjorn Helgaas
@ 2021-12-31  0:40           ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2021-12-31  0:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:

> I was speculating that maybe the DMA ownership claiming must be done
> *before* the driver's .probe() method?  

This is correct.

> If DMA ownership could be claimed by the .probe() method, we
> wouldn't need the new flag in struct device_driver.

The other requirement is that every existing driver must claim
ownership, so pushing this into the device driver's probe op would
require revising almost every driver in Linux...

In effect the new flag indicates if the driver will do the DMA
ownership claim in it's probe, or should use the default claim the
core code does.

In almost every case a driver should do a claim. A driver like
pci-stub, or a bridge, that doesn't actually operate MMIO on the
device would be the exception.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-31  0:40           ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2021-12-31  0:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:

> I was speculating that maybe the DMA ownership claiming must be done
> *before* the driver's .probe() method?  

This is correct.

> If DMA ownership could be claimed by the .probe() method, we
> wouldn't need the new flag in struct device_driver.

The other requirement is that every existing driver must claim
ownership, so pushing this into the device driver's probe op would
require revising almost every driver in Linux...

In effect the new flag indicates if the driver will do the DMA
ownership claim in it's probe, or should use the default claim the
core code does.

In almost every case a driver should do a claim. A driver like
pci-stub, or a bridge, that doesn't actually operate MMIO on the
device would be the exception.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-30 22:24         ` Bjorn Helgaas
@ 2021-12-31  1:06           ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 12/31/21 6:24 AM, Bjorn Helgaas wrote:
> On Thu, Dec 30, 2021 at 01:34:27PM +0800, Lu Baolu wrote:
>> Hi Bjorn,
>>
>> On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
>>> On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
>>>> The pci_dma_configure() marks the iommu_group as containing only devices
>>>> with kernel drivers that manage DMA.
>>>
>>> I'm looking at pci_dma_configure(), and I don't see the connection to
>>> iommu_groups.
>>
>> The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
>> sets all drivers' DMA to be kernel-managed by default except a few ones
>> which has a driver flag set. So by default, all iommu groups contains
>> only devices with kernel drivers managing DMA.
> 
> It looks like that happens in device_dma_configure(), not
> pci_dma_configure().
> 
>>>> Avoid this default behavior for the
>>>> pci_stub because it does not program any DMA itself.  This allows the
>>>> pci_stub still able to be used by the admin to block driver binding after
>>>> applying the DMA ownership to vfio.
>>>
>>>>
>>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>>> ---
>>>>    drivers/pci/pci-stub.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
>>>> index e408099fea52..6324c68602b4 100644
>>>> --- a/drivers/pci/pci-stub.c
>>>> +++ b/drivers/pci/pci-stub.c
>>>> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>>>>    	.name		= "pci-stub",
>>>>    	.id_table	= NULL,	/* only dynamic id's */
>>>>    	.probe		= pci_stub_probe,
>>>> +	.driver		= {
>>>> +		.suppress_auto_claim_dma_owner = true,
>>>
>>> The new .suppress_auto_claim_dma_owner controls whether we call
>>> iommu_device_set_dma_owner().  I guess you added
>>> .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
>>> must be done *before* we call the driver's .probe() method?
>>
>> As explained above, all drivers are set to kernel-managed dma by
>> default. For those vfio and vfio-approved drivers,
>> suppress_auto_claim_dma_owner is used to tell the driver core that "this
>> driver is attached to device for userspace assignment purpose, do not
>> claim it for kernel-management dma".
>>
>>> Otherwise, we could call some new interface from .probe() instead of
>>> adding the flag to struct device_driver.
>>
>> Most device drivers are of the kernel-managed DMA type. Only a few vfio
>> and vfio-approved drivers need to use this flag. That's the reason why
>> we claim kernel-managed DMA by default.
> 
> Yes.  But you didn't answer the question of whether this must be done
> by a new flag in struct device_driver, or whether it could be done by
> having these few VFIO and "VFIO-approved" (whatever that means)
> drivers call a new interface.
> 
> I was speculating that maybe the DMA ownership claiming must be done
> *before* the driver's .probe() method?  If so, that would require a
> new flag.  But I don't know whether that's the case.  If DMA
> ownership could be claimed by the .probe() method, we wouldn't need
> the new flag in struct device_driver.

Yes. It's feasible. Hence we can remove the suppress flag which is only
for some special drivers. I will come up with a new version so that you
can further comment with the real code. Thank you!

> 
> Bjorn
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-31  1:06           ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Jason Gunthorpe, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On 12/31/21 6:24 AM, Bjorn Helgaas wrote:
> On Thu, Dec 30, 2021 at 01:34:27PM +0800, Lu Baolu wrote:
>> Hi Bjorn,
>>
>> On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
>>> On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
>>>> The pci_dma_configure() marks the iommu_group as containing only devices
>>>> with kernel drivers that manage DMA.
>>>
>>> I'm looking at pci_dma_configure(), and I don't see the connection to
>>> iommu_groups.
>>
>> The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
>> sets all drivers' DMA to be kernel-managed by default except a few ones
>> which has a driver flag set. So by default, all iommu groups contains
>> only devices with kernel drivers managing DMA.
> 
> It looks like that happens in device_dma_configure(), not
> pci_dma_configure().
> 
>>>> Avoid this default behavior for the
>>>> pci_stub because it does not program any DMA itself.  This allows the
>>>> pci_stub still able to be used by the admin to block driver binding after
>>>> applying the DMA ownership to vfio.
>>>
>>>>
>>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>>> ---
>>>>    drivers/pci/pci-stub.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
>>>> index e408099fea52..6324c68602b4 100644
>>>> --- a/drivers/pci/pci-stub.c
>>>> +++ b/drivers/pci/pci-stub.c
>>>> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>>>>    	.name		= "pci-stub",
>>>>    	.id_table	= NULL,	/* only dynamic id's */
>>>>    	.probe		= pci_stub_probe,
>>>> +	.driver		= {
>>>> +		.suppress_auto_claim_dma_owner = true,
>>>
>>> The new .suppress_auto_claim_dma_owner controls whether we call
>>> iommu_device_set_dma_owner().  I guess you added
>>> .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
>>> must be done *before* we call the driver's .probe() method?
>>
>> As explained above, all drivers are set to kernel-managed dma by
>> default. For those vfio and vfio-approved drivers,
>> suppress_auto_claim_dma_owner is used to tell the driver core that "this
>> driver is attached to device for userspace assignment purpose, do not
>> claim it for kernel-management dma".
>>
>>> Otherwise, we could call some new interface from .probe() instead of
>>> adding the flag to struct device_driver.
>>
>> Most device drivers are of the kernel-managed DMA type. Only a few vfio
>> and vfio-approved drivers need to use this flag. That's the reason why
>> we claim kernel-managed DMA by default.
> 
> Yes.  But you didn't answer the question of whether this must be done
> by a new flag in struct device_driver, or whether it could be done by
> having these few VFIO and "VFIO-approved" (whatever that means)
> drivers call a new interface.
> 
> I was speculating that maybe the DMA ownership claiming must be done
> *before* the driver's .probe() method?  If so, that would require a
> new flag.  But I don't know whether that's the case.  If DMA
> ownership could be claimed by the .probe() method, we wouldn't need
> the new flag in struct device_driver.

Yes. It's feasible. Hence we can remove the suppress flag which is only
for some special drivers. I will come up with a new version so that you
can further comment with the real code. Thank you!

> 
> Bjorn
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-31  0:40           ` Jason Gunthorpe via iommu
@ 2021-12-31  1:10             ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

Hi Jason,

On 12/31/21 8:40 AM, Jason Gunthorpe wrote:
> On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:
> 
>> I was speculating that maybe the DMA ownership claiming must be done
>> *before* the driver's .probe() method?
> 
> This is correct.
> 
>> If DMA ownership could be claimed by the .probe() method, we
>> wouldn't need the new flag in struct device_driver.
> 
> The other requirement is that every existing driver must claim
> ownership, so pushing this into the device driver's probe op would
> require revising almost every driver in Linux...
> 
> In effect the new flag indicates if the driver will do the DMA
> ownership claim in it's probe, or should use the default claim the
> core code does.
> 
> In almost every case a driver should do a claim. A driver like
> pci-stub, or a bridge, that doesn't actually operate MMIO on the
> device would be the exception.

We still need to call iommu_device_use_dma_api() in bus dma_configure()
callback. But we can call iommu_device_unuse_dma_api() in the .probe()
of vfio (and vfio-approved) drivers, so that we don't need the new flag
anymore.

> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-31  1:10             ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

Hi Jason,

On 12/31/21 8:40 AM, Jason Gunthorpe wrote:
> On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:
> 
>> I was speculating that maybe the DMA ownership claiming must be done
>> *before* the driver's .probe() method?
> 
> This is correct.
> 
>> If DMA ownership could be claimed by the .probe() method, we
>> wouldn't need the new flag in struct device_driver.
> 
> The other requirement is that every existing driver must claim
> ownership, so pushing this into the device driver's probe op would
> require revising almost every driver in Linux...
> 
> In effect the new flag indicates if the driver will do the DMA
> ownership claim in it's probe, or should use the default claim the
> core code does.
> 
> In almost every case a driver should do a claim. A driver like
> pci-stub, or a bridge, that doesn't actually operate MMIO on the
> device would be the exception.

We still need to call iommu_device_use_dma_api() in bus dma_configure()
callback. But we can call iommu_device_unuse_dma_api() in the .probe()
of vfio (and vfio-approved) drivers, so that we don't need the new flag
anymore.

> 
> Jason
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-31  1:10             ` Lu Baolu
@ 2021-12-31  1:58               ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 12/31/21 9:10 AM, Lu Baolu wrote:
> 
> On 12/31/21 8:40 AM, Jason Gunthorpe wrote:
>> On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:
>>
>>> I was speculating that maybe the DMA ownership claiming must be done
>>> *before* the driver's .probe() method?
>>
>> This is correct.
>>
>>> If DMA ownership could be claimed by the .probe() method, we
>>> wouldn't need the new flag in struct device_driver.
>>
>> The other requirement is that every existing driver must claim
>> ownership, so pushing this into the device driver's probe op would
>> require revising almost every driver in Linux...
>>
>> In effect the new flag indicates if the driver will do the DMA
>> ownership claim in it's probe, or should use the default claim the
>> core code does.
>>
>> In almost every case a driver should do a claim. A driver like
>> pci-stub, or a bridge, that doesn't actually operate MMIO on the
>> device would be the exception.
> 
> We still need to call iommu_device_use_dma_api() in bus dma_configure()
> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
> of vfio (and vfio-approved) drivers, so that we don't need the new flag
> anymore.

Oh, wait. I didn't think about the hot-plug case. If we call
iommu_device_use_dma_api() in bus dma_configure() anyway, we can't bind
any (no matter vfio or none-vfio) driver to a device if it's group has
already been assigned to user space. It seems that we can't omit this
flag.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2021-12-31  1:58               ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2021-12-31  1:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Bjorn Helgaas
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On 12/31/21 9:10 AM, Lu Baolu wrote:
> 
> On 12/31/21 8:40 AM, Jason Gunthorpe wrote:
>> On Thu, Dec 30, 2021 at 04:24:14PM -0600, Bjorn Helgaas wrote:
>>
>>> I was speculating that maybe the DMA ownership claiming must be done
>>> *before* the driver's .probe() method?
>>
>> This is correct.
>>
>>> If DMA ownership could be claimed by the .probe() method, we
>>> wouldn't need the new flag in struct device_driver.
>>
>> The other requirement is that every existing driver must claim
>> ownership, so pushing this into the device driver's probe op would
>> require revising almost every driver in Linux...
>>
>> In effect the new flag indicates if the driver will do the DMA
>> ownership claim in it's probe, or should use the default claim the
>> core code does.
>>
>> In almost every case a driver should do a claim. A driver like
>> pci-stub, or a bridge, that doesn't actually operate MMIO on the
>> device would be the exception.
> 
> We still need to call iommu_device_use_dma_api() in bus dma_configure()
> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
> of vfio (and vfio-approved) drivers, so that we don't need the new flag
> anymore.

Oh, wait. I didn't think about the hot-plug case. If we call
iommu_device_use_dma_api() in bus dma_configure() anyway, we can't bind
any (no matter vfio or none-vfio) driver to a device if it's group has
already been assigned to user space. It seems that we can't omit this
flag.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-12-31  1:10             ` Lu Baolu
@ 2022-01-03 19:53               ` Jason Gunthorpe via iommu
  -1 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe @ 2022-01-03 19:53 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Bjorn Helgaas, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Christoph Hellwig, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On Fri, Dec 31, 2021 at 09:10:43AM +0800, Lu Baolu wrote:

> We still need to call iommu_device_use_dma_api() in bus dma_configure()
> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
> of vfio (and vfio-approved) drivers, so that we don't need the new flag
> anymore.

No, we can't. The action that iommu_device_use_dma_api() takes is to
not call probe, it obviously cannot be undone by code inside probe.

Jason

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2022-01-03 19:53               ` Jason Gunthorpe via iommu
  0 siblings, 0 replies; 94+ messages in thread
From: Jason Gunthorpe via iommu @ 2022-01-03 19:53 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Bjorn Helgaas, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On Fri, Dec 31, 2021 at 09:10:43AM +0800, Lu Baolu wrote:

> We still need to call iommu_device_use_dma_api() in bus dma_configure()
> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
> of vfio (and vfio-approved) drivers, so that we don't need the new flag
> anymore.

No, we can't. The action that iommu_device_use_dma_api() takes is to
not call probe, it obviously cannot be undone by code inside probe.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
  2021-12-24  2:50                   ` Jason Gunthorpe via iommu
@ 2022-01-04  1:53                     ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  1:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Robin Murphy, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Dan Williams, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 12/24/21 10:50 AM, Jason Gunthorpe wrote:
>>> We don't need _USER anymore because iommu_group_set_dma_owner() always
>>> does detatch, and iommu_replace_group_domain() avoids ever reassigning
>>> default_domain. The sepecial USER behavior falls out automatically.
>> This means we will grow more group-centric interfaces. My understanding
>> is the opposite that we should hide the concept of group in IOMMU
>> subsystem, and the device drivers only faces device specific interfaces.
> Ideally group interfaces would be reduced, but in this case VFIO needs
> the group. It has sort of a fundamental problem with its uAPI that
> expects the container is fully setup with a domain at the moment the
> group is attached. So deferring domain setup to when the device is
> available becomes a user visible artifact - and if this is important
> or not is a whole research question that isn't really that important
> for this series.
> 
> We also can't just pull a device out of thin air, a device that hasn't
> been probed() hasn't even had dma_configure called! Let alone the
> lifetime and locking problems with that kind of idea.
> 
> So.. leaving it as a group interface makes the most sense,
> particularly for this series which is really about fixing the sharing
> model in the iommu core and deleting the BUG_ONs.

I feel it makes more sense if we leave the attach_device/group
refactoring patches into a separated series. I will come up with this
new series so that people can review and comment on the real code.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
@ 2022-01-04  1:53                     ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  1:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On 12/24/21 10:50 AM, Jason Gunthorpe wrote:
>>> We don't need _USER anymore because iommu_group_set_dma_owner() always
>>> does detatch, and iommu_replace_group_domain() avoids ever reassigning
>>> default_domain. The sepecial USER behavior falls out automatically.
>> This means we will grow more group-centric interfaces. My understanding
>> is the opposite that we should hide the concept of group in IOMMU
>> subsystem, and the device drivers only faces device specific interfaces.
> Ideally group interfaces would be reduced, but in this case VFIO needs
> the group. It has sort of a fundamental problem with its uAPI that
> expects the container is fully setup with a domain at the moment the
> group is attached. So deferring domain setup to when the device is
> available becomes a user visible artifact - and if this is important
> or not is a whole research question that isn't really that important
> for this series.
> 
> We also can't just pull a device out of thin air, a device that hasn't
> been probed() hasn't even had dma_configure called! Let alone the
> lifetime and locking problems with that kind of idea.
> 
> So.. leaving it as a group interface makes the most sense,
> particularly for this series which is really about fixing the sharing
> model in the iommu core and deleting the BUG_ONs.

I feel it makes more sense if we leave the attach_device/group
refactoring patches into a separated series. I will come up with this
new series so that people can review and comment on the real code.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2022-01-03 19:53               ` Jason Gunthorpe via iommu
@ 2022-01-04  1:54                 ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  1:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Bjorn Helgaas, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Christoph Hellwig, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 1/4/22 3:53 AM, Jason Gunthorpe wrote:
> On Fri, Dec 31, 2021 at 09:10:43AM +0800, Lu Baolu wrote:
> 
>> We still need to call iommu_device_use_dma_api() in bus dma_configure()
>> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
>> of vfio (and vfio-approved) drivers, so that we don't need the new flag
>> anymore.
> 
> No, we can't. The action that iommu_device_use_dma_api() takes is to
> not call probe, it obviously cannot be undone by code inside probe.

Yes. Agreed.

> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
@ 2022-01-04  1:54                 ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  1:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stuart Yoder, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Ashok Raj,
	Jonathan Hunter, Christoph Hellwig, Bjorn Helgaas, Kevin Tian,
	Chaitanya Kulkarni, Alex Williamson, kvm, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Li Yang, iommu, Jacob jun Pan, Daniel Vetter, Robin Murphy

On 1/4/22 3:53 AM, Jason Gunthorpe wrote:
> On Fri, Dec 31, 2021 at 09:10:43AM +0800, Lu Baolu wrote:
> 
>> We still need to call iommu_device_use_dma_api() in bus dma_configure()
>> callback. But we can call iommu_device_unuse_dma_api() in the .probe()
>> of vfio (and vfio-approved) drivers, so that we don't need the new flag
>> anymore.
> 
> No, we can't. The action that iommu_device_use_dma_api() takes is to
> not call probe, it obviously cannot be undone by code inside probe.

Yes. Agreed.

> Jason
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier()
  2021-12-17  6:36 ` Lu Baolu
@ 2022-01-04  5:23   ` Lu Baolu
  -1 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  5:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: baolu.lu, Will Deacon, Robin Murphy, Dan Williams, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, Stuart Yoder, Laurentiu Tudor,
	Thierry Reding, David Airlie, Daniel Vetter, Jonathan Hunter,
	Li Yang, Dmitry Osipenko, iommu, linux-pci, kvm, linux-kernel

On 12/17/21 2:36 PM, Lu Baolu wrote:
> Hi folks,
> 
> The iommu group is the minimal isolation boundary for DMA. Devices in
> a group can access each other's MMIO registers via peer to peer DMA
> and also need share the same I/O address space.
> 
> Once the I/O address space is assigned to user control it is no longer
> available to the dma_map* API, which effectively makes the DMA API
> non-working.
> 
> Second, userspace can use DMA initiated by a device that it controls
> to access the MMIO spaces of other devices in the group. This allows
> userspace to indirectly attack any kernel owned device and it's driver.
> 
> Therefore groups must either be entirely under kernel control or
> userspace control, never a mixture. Unfortunately some systems have
> problems with the granularity of groups and there are a couple of
> important exceptions:
> 
>   - pci_stub allows the admin to block driver binding on a device and
>     make it permanently shared with userspace. Since PCI stub does not
>     do DMA it is safe, however the admin must understand that using
>     pci_stub allows userspace to attack whatever device it was bound
>     it.
> 
>   - PCI bridges are sometimes included in groups. Typically PCI bridges
>     do not use DMA, and generally do not have MMIO regions.
> 
> Generally any device that does not have any MMIO registers is a
> possible candidate for an exception.
> 
> Currently vfio adopts a workaround to detect violations of the above
> restrictions by monitoring the driver core BOUND event, and hardwiring
> the above exceptions. Since there is no way for vfio to reject driver
> binding at this point, BUG_ON() is triggered if a violation is
> captured (kernel driver BOUND event on a group which already has some
> devices assigned to userspace). Aside from the bad user experience
> this opens a way for root userspace to crash the kernel, even in high
> integrity configurations, by manipulating the module binding and
> triggering the BUG_ON.
> 
> This series solves this problem by making the user/kernel ownership a
> core concept at the IOMMU layer. The driver core enforces kernel
> ownership while drivers are bound and violations now result in a error
> codes during probe, not BUG_ON failures.
> 
> Patch partitions:
>    [PATCH 1-4]: Detect DMA ownership conflicts during driver binding;
>    [PATCH 5-8]: Add security context management for assigned devices;
>    [PATCH 9-13]: Various cleanups.
> 
> This is also part one of three initial series for IOMMUFD:
>   * Move IOMMU Group security into the iommu layer
>   - Generic IOMMUFD implementation
>   - VFIO ability to consume IOMMUFD
> 
> Change log:
> v1: initial post
>    - https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/
> 
> v2:
>    - https://lore.kernel.org/linux-iommu/20211128025051.355578-1-baolu.lu@linux.intel.com/
> 
>    - Move kernel dma ownership auto-claiming from driver core to bus
>      callback. [Greg/Christoph/Robin/Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#m153706912b770682cb12e3c28f57e171aa1f9d0c
> 
>    - Code and interface refactoring for iommu_set/release_dma_owner()
>      interfaces. [Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e
> 
>    - [NEW]Add new iommu_attach/detach_device_shared() interfaces for
>      multiple devices group. [Robin/Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e
> 
>    - [NEW]Use iommu_attach/detach_device_shared() in drm/tegra drivers.
> 
>    - Refactoring and description refinement.
> 
> v3:
>    - https://lore.kernel.org/linux-iommu/20211206015903.88687-1-baolu.lu@linux.intel.com/
> 
>    - Rename bus_type::dma_unconfigure to bus_type::dma_cleanup. [Greg]
>      https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m6711e041e47cb0cbe3964fad0a3466f5ae4b3b9b
> 
>    - Avoid _platform_dma_configure for platform_bus_type::dma_configure.
>      [Greg]
>      https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m43fc46286611aa56a5c0eeaad99d539e5519f3f6
> 
>    - Patch "0012-iommu-Add-iommu_at-de-tach_device_shared-for-mult.patch"
>      and "0018-drm-tegra-Use-the-iommu-dma_owner-mechanism.patch" have
>      been tested by Dmitry Osipenko <digetx@gmail.com>.
> 
> v4:
>    - Remove unnecessary tegra->domain chech in the tegra patch. (Jason)
>    - Remove DMA_OWNER_NONE. (Joerg)
>    - Change refcount to unsigned int. (Christoph)
>    - Move mutex lock into group set_dma_owner functions. (Christoph)
>    - Add kernel doc for iommu_attach/detach_domain_shared(). (Christoph)
>    - Move dma auto-claim into driver core. (Jason/Christoph)

Thank you very much for the review comments. A new version has been
posted.

https://lore.kernel.org/linux-iommu/20220104015644.2294354-1-baolu.lu@linux.intel.com/

Best regards,
baolu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier()
@ 2022-01-04  5:23   ` Lu Baolu
  0 siblings, 0 replies; 94+ messages in thread
From: Lu Baolu @ 2022-01-04  5:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Christoph Hellwig, Kevin Tian, Ashok Raj
  Cc: kvm, rafael, David Airlie, linux-pci, Thierry Reding,
	Diana Craciun, Dmitry Osipenko, Will Deacon, Stuart Yoder,
	Jonathan Hunter, Chaitanya Kulkarni, Dan Williams, Cornelia Huck,
	linux-kernel, Li Yang, iommu, Jacob jun Pan, Daniel Vetter,
	Robin Murphy

On 12/17/21 2:36 PM, Lu Baolu wrote:
> Hi folks,
> 
> The iommu group is the minimal isolation boundary for DMA. Devices in
> a group can access each other's MMIO registers via peer to peer DMA
> and also need share the same I/O address space.
> 
> Once the I/O address space is assigned to user control it is no longer
> available to the dma_map* API, which effectively makes the DMA API
> non-working.
> 
> Second, userspace can use DMA initiated by a device that it controls
> to access the MMIO spaces of other devices in the group. This allows
> userspace to indirectly attack any kernel owned device and it's driver.
> 
> Therefore groups must either be entirely under kernel control or
> userspace control, never a mixture. Unfortunately some systems have
> problems with the granularity of groups and there are a couple of
> important exceptions:
> 
>   - pci_stub allows the admin to block driver binding on a device and
>     make it permanently shared with userspace. Since PCI stub does not
>     do DMA it is safe, however the admin must understand that using
>     pci_stub allows userspace to attack whatever device it was bound
>     it.
> 
>   - PCI bridges are sometimes included in groups. Typically PCI bridges
>     do not use DMA, and generally do not have MMIO regions.
> 
> Generally any device that does not have any MMIO registers is a
> possible candidate for an exception.
> 
> Currently vfio adopts a workaround to detect violations of the above
> restrictions by monitoring the driver core BOUND event, and hardwiring
> the above exceptions. Since there is no way for vfio to reject driver
> binding at this point, BUG_ON() is triggered if a violation is
> captured (kernel driver BOUND event on a group which already has some
> devices assigned to userspace). Aside from the bad user experience
> this opens a way for root userspace to crash the kernel, even in high
> integrity configurations, by manipulating the module binding and
> triggering the BUG_ON.
> 
> This series solves this problem by making the user/kernel ownership a
> core concept at the IOMMU layer. The driver core enforces kernel
> ownership while drivers are bound and violations now result in a error
> codes during probe, not BUG_ON failures.
> 
> Patch partitions:
>    [PATCH 1-4]: Detect DMA ownership conflicts during driver binding;
>    [PATCH 5-8]: Add security context management for assigned devices;
>    [PATCH 9-13]: Various cleanups.
> 
> This is also part one of three initial series for IOMMUFD:
>   * Move IOMMU Group security into the iommu layer
>   - Generic IOMMUFD implementation
>   - VFIO ability to consume IOMMUFD
> 
> Change log:
> v1: initial post
>    - https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/
> 
> v2:
>    - https://lore.kernel.org/linux-iommu/20211128025051.355578-1-baolu.lu@linux.intel.com/
> 
>    - Move kernel dma ownership auto-claiming from driver core to bus
>      callback. [Greg/Christoph/Robin/Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#m153706912b770682cb12e3c28f57e171aa1f9d0c
> 
>    - Code and interface refactoring for iommu_set/release_dma_owner()
>      interfaces. [Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e
> 
>    - [NEW]Add new iommu_attach/detach_device_shared() interfaces for
>      multiple devices group. [Robin/Jason]
>      https://lore.kernel.org/linux-iommu/20211115020552.2378167-1-baolu.lu@linux.intel.com/T/#mea70ed8e4e3665aedf32a5a0a7db095bf680325e
> 
>    - [NEW]Use iommu_attach/detach_device_shared() in drm/tegra drivers.
> 
>    - Refactoring and description refinement.
> 
> v3:
>    - https://lore.kernel.org/linux-iommu/20211206015903.88687-1-baolu.lu@linux.intel.com/
> 
>    - Rename bus_type::dma_unconfigure to bus_type::dma_cleanup. [Greg]
>      https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m6711e041e47cb0cbe3964fad0a3466f5ae4b3b9b
> 
>    - Avoid _platform_dma_configure for platform_bus_type::dma_configure.
>      [Greg]
>      https://lore.kernel.org/linux-iommu/c3230ace-c878-39db-1663-2b752ff5384e@linux.intel.com/T/#m43fc46286611aa56a5c0eeaad99d539e5519f3f6
> 
>    - Patch "0012-iommu-Add-iommu_at-de-tach_device_shared-for-mult.patch"
>      and "0018-drm-tegra-Use-the-iommu-dma_owner-mechanism.patch" have
>      been tested by Dmitry Osipenko <digetx@gmail.com>.
> 
> v4:
>    - Remove unnecessary tegra->domain chech in the tegra patch. (Jason)
>    - Remove DMA_OWNER_NONE. (Joerg)
>    - Change refcount to unsigned int. (Christoph)
>    - Move mutex lock into group set_dma_owner functions. (Christoph)
>    - Add kernel doc for iommu_attach/detach_domain_shared(). (Christoph)
>    - Move dma auto-claim into driver core. (Jason/Christoph)

Thank you very much for the review comments. A new version has been
posted.

https://lore.kernel.org/linux-iommu/20220104015644.2294354-1-baolu.lu@linux.intel.com/

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2022-01-04  5:23 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17  6:36 [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2021-12-17  6:36 ` Lu Baolu
2021-12-17  6:36 ` [PATCH v4 01/13] iommu: Add device dma ownership set/release interfaces Lu Baolu
2021-12-17  6:36   ` Lu Baolu
2021-12-17  6:36 ` [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
2021-12-17  6:36   ` Lu Baolu
2021-12-22 12:47   ` Greg Kroah-Hartman
2021-12-22 12:47     ` Greg Kroah-Hartman
2021-12-22 17:52     ` Jason Gunthorpe
2021-12-22 17:52       ` Jason Gunthorpe via iommu
2021-12-23  2:08     ` Lu Baolu
2021-12-23  2:08       ` Lu Baolu
2021-12-23  3:02     ` Lu Baolu
2021-12-23  3:02       ` Lu Baolu
2021-12-23  7:13       ` Greg Kroah-Hartman
2021-12-23  7:13         ` Greg Kroah-Hartman
2021-12-23  7:23         ` Lu Baolu
2021-12-23  7:23           ` Lu Baolu
2021-12-31  0:36           ` Jason Gunthorpe
2021-12-31  0:36             ` Jason Gunthorpe via iommu
2021-12-17  6:36 ` [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
2021-12-17  6:36   ` Lu Baolu
2021-12-29 20:42   ` Bjorn Helgaas
2021-12-29 20:42     ` Bjorn Helgaas
2021-12-30  5:34     ` Lu Baolu
2021-12-30  5:34       ` Lu Baolu
2021-12-30 22:24       ` Bjorn Helgaas
2021-12-30 22:24         ` Bjorn Helgaas
2021-12-31  0:40         ` Jason Gunthorpe
2021-12-31  0:40           ` Jason Gunthorpe via iommu
2021-12-31  1:10           ` Lu Baolu
2021-12-31  1:10             ` Lu Baolu
2021-12-31  1:58             ` Lu Baolu
2021-12-31  1:58               ` Lu Baolu
2022-01-03 19:53             ` Jason Gunthorpe
2022-01-03 19:53               ` Jason Gunthorpe via iommu
2022-01-04  1:54               ` Lu Baolu
2022-01-04  1:54                 ` Lu Baolu
2021-12-31  1:06         ` Lu Baolu
2021-12-31  1:06           ` Lu Baolu
2021-12-17  6:36 ` [PATCH v4 04/13] PCI: portdrv: " Lu Baolu
2021-12-17  6:36   ` Lu Baolu
2021-12-29 21:16   ` Bjorn Helgaas
2021-12-29 21:16     ` Bjorn Helgaas
2021-12-30  5:49     ` Lu Baolu
2021-12-30  5:49       ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 05/13] iommu: Add security context management for assigned devices Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 06/13] iommu: Expose group variants of dma ownership interfaces Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-21 16:50   ` Robin Murphy
2021-12-21 16:50     ` Robin Murphy
2021-12-21 18:46     ` Jason Gunthorpe
2021-12-21 18:46       ` Jason Gunthorpe via iommu
2021-12-22  4:22       ` Lu Baolu
2021-12-22  4:22         ` Lu Baolu
2021-12-22  4:25         ` Lu Baolu
2021-12-22  4:25           ` Lu Baolu
2021-12-22 20:26       ` Robin Murphy
2021-12-22 20:26         ` Robin Murphy
2021-12-23  0:57         ` Jason Gunthorpe
2021-12-23  0:57           ` Jason Gunthorpe via iommu
2021-12-23  5:53           ` Lu Baolu
2021-12-23  5:53             ` Lu Baolu
2021-12-23 14:03             ` Jason Gunthorpe
2021-12-23 14:03               ` Jason Gunthorpe via iommu
2021-12-24  1:30               ` Lu Baolu
2021-12-24  1:30                 ` Lu Baolu
2021-12-24  2:50                 ` Jason Gunthorpe
2021-12-24  2:50                   ` Jason Gunthorpe via iommu
2021-12-24  6:44                   ` Lu Baolu
2021-12-24  6:44                     ` Lu Baolu
2022-01-04  1:53                   ` Lu Baolu
2022-01-04  1:53                     ` Lu Baolu
2021-12-24  3:19         ` Lu Baolu
2021-12-24  3:19           ` Lu Baolu
2021-12-24 14:24           ` Jason Gunthorpe
2021-12-24 14:24             ` Jason Gunthorpe via iommu
2021-12-17  6:37 ` [PATCH v4 08/13] vfio: Set DMA USER ownership for VFIO devices Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 09/13] vfio: Remove use of vfio_group_viable() Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 10/13] vfio: Delete the unbound_list Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 11/13] vfio: Remove iommu group notifier Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 12/13] iommu: Remove iommu group changes notifier Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 13/13] drm/tegra: Use the iommu dma_owner mechanism Lu Baolu
2021-12-17  6:37   ` Lu Baolu
2022-01-04  5:23 ` [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2022-01-04  5:23   ` Lu Baolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.