linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] iommu cleanup and refactoring
@ 2022-01-24  7:10 Lu Baolu
  2022-01-24  7:10 ` [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks Lu Baolu
                   ` (8 more replies)
  0 siblings, 9 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:10 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

Hi,

The guest pasid and aux-domain related code are dead code in current
iommu subtree. As we have reached a consensus that all these features
should be based on the new iommufd framework (which is under active
development), the first part of this series removes and cleanups all
the dead code.

The second part of this series refactors the iommu_domain by moving all
domain-specific ops from iommu_ops to a new domain_ops. This makes an
iommu_domain self-contained and represent the abstraction of an I/O
translation table in the IOMMU subsystem. With different type of
iommu_domain providing different set of ops, it's easier to support more
types of I/O translation tables.

Please help to review and comment.

Best regards,
baolu

Lu Baolu (7):
  iommu/vt-d: Remove guest pasid related callbacks
  iommu: Remove guest pasid related interfaces and definitions
  iommu/vt-d: Remove aux-domain related callbacks
  iommu: Remove aux-domain related interfaces and iommu_ops
  drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
  iommu: Use right way to retrieve iommu_ops
  iommu: Add iommu_domain::domain_ops

 include/linux/intel-iommu.h                   |  27 -
 include/linux/intel-svm.h                     |  12 -
 include/linux/iommu.h                         | 174 ++----
 drivers/iommu/intel/pasid.h                   |   4 -
 include/uapi/linux/iommu.h                    | 181 ------
 .../drm/nouveau/nvkm/engine/device/tegra.c    |   2 +-
 drivers/iommu/amd/iommu.c                     |  21 +-
 drivers/iommu/apple-dart.c                    |  24 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  22 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |  23 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c       |  17 +-
 drivers/iommu/exynos-iommu.c                  |  17 +-
 drivers/iommu/fsl_pamu_domain.c               |  13 +-
 drivers/iommu/intel/debugfs.c                 |   3 +-
 drivers/iommu/intel/iommu.c                   | 542 +-----------------
 drivers/iommu/intel/pasid.c                   | 161 ------
 drivers/iommu/intel/svm.c                     | 209 -------
 drivers/iommu/iommu.c                         | 296 +---------
 drivers/iommu/ipmmu-vmsa.c                    |  21 +-
 drivers/iommu/msm_iommu.c                     |  17 +-
 drivers/iommu/mtk_iommu.c                     |  24 +-
 drivers/iommu/mtk_iommu_v1.c                  |  19 +-
 drivers/iommu/omap-iommu.c                    |  15 +-
 drivers/iommu/rockchip-iommu.c                |  17 +-
 drivers/iommu/s390-iommu.c                    |  15 +-
 drivers/iommu/sprd-iommu.c                    |  19 +-
 drivers/iommu/sun50i-iommu.c                  |  18 +-
 drivers/iommu/tegra-gart.c                    |  15 +-
 drivers/iommu/tegra-smmu.c                    |  16 +-
 drivers/iommu/virtio-iommu.c                  |  18 +-
 30 files changed, 331 insertions(+), 1631 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
@ 2022-01-24  7:10 ` Lu Baolu
  2022-01-24  9:25   ` Christoph Hellwig
  2022-01-24  7:10 ` [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions Lu Baolu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:10 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The guest pasid related callbacks are not called in the tree. Remove them
to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/intel-iommu.h |  10 --
 include/linux/intel-svm.h   |  12 ---
 drivers/iommu/intel/pasid.h |   4 -
 drivers/iommu/intel/iommu.c | 208 -----------------------------------
 drivers/iommu/intel/pasid.c | 161 ---------------------------
 drivers/iommu/intel/svm.c   | 209 ------------------------------------
 6 files changed, 604 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 69230fd695ea..beaacb589abd 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -525,12 +525,6 @@ struct context_entry {
  */
 #define DOMAIN_FLAG_USE_FIRST_LEVEL		BIT(1)
 
-/*
- * Domain represents a virtual machine which demands iommu nested
- * translation mode support.
- */
-#define DOMAIN_FLAG_NESTING_MODE		BIT(2)
-
 struct dmar_domain {
 	int	nid;			/* node id */
 
@@ -765,9 +759,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
 extern void intel_svm_check(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
-			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
@@ -795,7 +786,6 @@ struct intel_svm {
 
 	unsigned int flags;
 	u32 pasid;
-	int gpasid; /* In case that guest PASID is different from host PASID */
 	struct list_head devs;
 };
 #else
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 1b73bab7eeff..b3b125b332aa 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -25,17 +25,5 @@
  * do such IOTLB flushes automatically.
  */
 #define SVM_FLAG_SUPERVISOR_MODE	BIT(0)
-/*
- * The SVM_FLAG_GUEST_MODE flag is used when a PASID bind is for guest
- * processes. Compared to the host bind, the primary differences are:
- * 1. mm life cycle management
- * 2. fault reporting
- */
-#define SVM_FLAG_GUEST_MODE		BIT(1)
-/*
- * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
- * which requires guest and host PASID translation at both directions.
- */
-#define SVM_FLAG_GUEST_PASID		BIT(2)
 
 #endif /* __INTEL_SVM_H__ */
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d5552e2c160d..ab4408c824a5 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -118,10 +118,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
-int intel_pasid_setup_nested(struct intel_iommu *iommu,
-			     struct device *dev, pgd_t *pgd, u32 pasid,
-			     struct iommu_gpasid_bind_data_vtd *pasid_data,
-			     struct dmar_domain *domain, int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 92fea3fbbb11..4f9d95067c8f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4825,13 +4825,6 @@ static int prepare_domain_attach_device(struct iommu_domain *domain,
 	if (!iommu)
 		return -ENODEV;
 
-	if ((dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE) &&
-	    !ecap_nest(iommu->ecap)) {
-		dev_err(dev, "%s: iommu not support nested translation\n",
-			iommu->name);
-		return -EINVAL;
-	}
-
 	/* check if this iommu agaw is sufficient for max mapped address */
 	addr_width = agaw_to_width(iommu->agaw);
 	if (addr_width > cap_mgaw(iommu->cap))
@@ -4919,185 +4912,6 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
 	aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
-/*
- * 2D array for converting and sanitizing IOMMU generic TLB granularity to
- * VT-d granularity. Invalidation is typically included in the unmap operation
- * as a result of DMA or VFIO unmap. However, for assigned devices guest
- * owns the first level page tables. Invalidations of translation caches in the
- * guest are trapped and passed down to the host.
- *
- * vIOMMU in the guest will only expose first level page tables, therefore
- * we do not support IOTLB granularity for request without PASID (second level).
- *
- * For example, to find the VT-d granularity encoding for IOTLB
- * type and page selective granularity within PASID:
- * X: indexed by iommu cache type
- * Y: indexed by enum iommu_inv_granularity
- * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
- */
-
-static const int
-inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
-	/*
-	 * PASID based IOTLB invalidation: PASID selective (per PASID),
-	 * page selective (address granularity)
-	 */
-	{-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
-	/* PASID based dev TLBs */
-	{-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL},
-	/* PASID cache */
-	{-EINVAL, -EINVAL, -EINVAL}
-};
-
-static inline int to_vtd_granularity(int type, int granu)
-{
-	return inv_type_granu_table[type][granu];
-}
-
-static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
-{
-	u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
-
-	/* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
-	 * IOMMU cache invalidate API passes granu_size in bytes, and number of
-	 * granu size in contiguous memory.
-	 */
-	return order_base_2(nr_pages);
-}
-
-static int
-intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
-			   struct iommu_cache_invalidate_info *inv_info)
-{
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	struct device_domain_info *info;
-	struct intel_iommu *iommu;
-	unsigned long flags;
-	int cache_type;
-	u8 bus, devfn;
-	u16 did, sid;
-	int ret = 0;
-	u64 size = 0;
-
-	if (!inv_info || !dmar_domain)
-		return -EINVAL;
-
-	if (!dev || !dev_is_pci(dev))
-		return -ENODEV;
-
-	iommu = device_to_iommu(dev, &bus, &devfn);
-	if (!iommu)
-		return -ENODEV;
-
-	if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
-		return -EINVAL;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	spin_lock(&iommu->lock);
-	info = get_domain_info(dev);
-	if (!info) {
-		ret = -EINVAL;
-		goto out_unlock;
-	}
-	did = dmar_domain->iommu_did[iommu->seq_id];
-	sid = PCI_DEVID(bus, devfn);
-
-	/* Size is only valid in address selective invalidation */
-	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-		size = to_vtd_size(inv_info->granu.addr_info.granule_size,
-				   inv_info->granu.addr_info.nb_granules);
-
-	for_each_set_bit(cache_type,
-			 (unsigned long *)&inv_info->cache,
-			 IOMMU_CACHE_INV_TYPE_NR) {
-		int granu = 0;
-		u64 pasid = 0;
-		u64 addr = 0;
-
-		granu = to_vtd_granularity(cache_type, inv_info->granularity);
-		if (granu == -EINVAL) {
-			pr_err_ratelimited("Invalid cache type and granu combination %d/%d\n",
-					   cache_type, inv_info->granularity);
-			break;
-		}
-
-		/*
-		 * PASID is stored in different locations based on the
-		 * granularity.
-		 */
-		if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-		    (inv_info->granu.pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-			pasid = inv_info->granu.pasid_info.pasid;
-		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
-			pasid = inv_info->granu.addr_info.pasid;
-
-		switch (BIT(cache_type)) {
-		case IOMMU_CACHE_INV_TYPE_IOTLB:
-			/* HW will ignore LSB bits based on address mask */
-			if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-			    size &&
-			    (inv_info->granu.addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
-				pr_err_ratelimited("User address not aligned, 0x%llx, size order %llu\n",
-						   inv_info->granu.addr_info.addr, size);
-			}
-
-			/*
-			 * If granu is PASID-selective, address is ignored.
-			 * We use npages = -1 to indicate that.
-			 */
-			qi_flush_piotlb(iommu, did, pasid,
-					mm_to_dma_pfn(inv_info->granu.addr_info.addr),
-					(granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size,
-					inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
-
-			if (!info->ats_enabled)
-				break;
-			/*
-			 * Always flush device IOTLB if ATS is enabled. vIOMMU
-			 * in the guest may assume IOTLB flush is inclusive,
-			 * which is more efficient.
-			 */
-			fallthrough;
-		case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
-			/*
-			 * PASID based device TLB invalidation does not support
-			 * IOMMU_INV_GRANU_PASID granularity but only supports
-			 * IOMMU_INV_GRANU_ADDR.
-			 * The equivalent of that is we set the size to be the
-			 * entire range of 64 bit. User only provides PASID info
-			 * without address info. So we set addr to 0.
-			 */
-			if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
-				size = 64 - VTD_PAGE_SHIFT;
-				addr = 0;
-			} else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
-				addr = inv_info->granu.addr_info.addr;
-			}
-
-			if (info->ats_enabled)
-				qi_flush_dev_iotlb_pasid(iommu, sid,
-						info->pfsid, pasid,
-						info->ats_qdep, addr,
-						size);
-			else
-				pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n");
-			break;
-		default:
-			dev_err_ratelimited(dev, "Unsupported IOMMU invalidation type %d\n",
-					    cache_type);
-			ret = -EINVAL;
-		}
-	}
-out_unlock:
-	spin_unlock(&iommu->lock);
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return ret;
-}
-#endif
-
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot, gfp_t gfp)
@@ -5545,24 +5359,6 @@ static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
 	return attach_deferred(dev);
 }
 
-static int
-intel_iommu_enable_nesting(struct iommu_domain *domain)
-{
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	unsigned long flags;
-	int ret = -ENODEV;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	if (list_empty(&dmar_domain->devices)) {
-		dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE;
-		dmar_domain->flags &= ~DOMAIN_FLAG_USE_FIRST_LEVEL;
-		ret = 0;
-	}
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return ret;
-}
-
 /*
  * Check that the device does not live on an external facing PCI port that is
  * marked as untrusted. Such devices should not be able to apply quirks and
@@ -5599,7 +5395,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_free		= intel_iommu_domain_free,
-	.enable_nesting		= intel_iommu_enable_nesting,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
 	.aux_attach_dev		= intel_iommu_aux_attach_device,
@@ -5624,9 +5419,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.pgsize_bitmap		= SZ_4K,
 #ifdef CONFIG_INTEL_IOMMU_SVM
-	.cache_invalidate	= intel_iommu_sva_invalidate,
-	.sva_bind_gpasid	= intel_svm_bind_gpasid,
-	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
 	.sva_bind		= intel_svm_bind,
 	.sva_unbind		= intel_svm_unbind,
 	.sva_get_pasid		= intel_svm_get_pasid,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 07c390aed1fe..10fb82ea467d 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -762,164 +762,3 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 
 	return 0;
 }
-
-static int
-intel_pasid_setup_bind_data(struct intel_iommu *iommu, struct pasid_entry *pte,
-			    struct iommu_gpasid_bind_data_vtd *pasid_data)
-{
-	/*
-	 * Not all guest PASID table entry fields are passed down during bind,
-	 * here we only set up the ones that are dependent on guest settings.
-	 * Execution related bits such as NXE, SMEP are not supported.
-	 * Other fields, such as snoop related, are set based on host needs
-	 * regardless of guest settings.
-	 */
-	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
-		if (!ecap_srs(iommu->ecap)) {
-			pr_err_ratelimited("No supervisor request support on %s\n",
-					   iommu->name);
-			return -EINVAL;
-		}
-		pasid_set_sre(pte);
-		/* Enable write protect WP if guest requested */
-		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE)
-			pasid_set_wpe(pte);
-	}
-
-	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
-		if (!ecap_eafs(iommu->ecap)) {
-			pr_err_ratelimited("No extended access flag support on %s\n",
-					   iommu->name);
-			return -EINVAL;
-		}
-		pasid_set_eafe(pte);
-	}
-
-	/*
-	 * Memory type is only applicable to devices inside processor coherent
-	 * domain. Will add MTS support once coherent devices are available.
-	 */
-	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_MTS_MASK) {
-		pr_warn_ratelimited("No memory type support %s\n",
-				    iommu->name);
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-/**
- * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
- * This could be used for guest shared virtual address. In this case, the
- * first level page tables are used for GVA-GPA translation in the guest,
- * second level page tables are used for GPA-HPA translation.
- *
- * @iommu:      IOMMU which the device belong to
- * @dev:        Device to be set up for translation
- * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
- * @pasid:      PASID to be programmed in the device PASID table
- * @pasid_data: Additional PASID info from the guest bind request
- * @domain:     Domain info for setting up second level page tables
- * @addr_width: Address width of the first level (guest)
- */
-int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
-			     pgd_t *gpgd, u32 pasid,
-			     struct iommu_gpasid_bind_data_vtd *pasid_data,
-			     struct dmar_domain *domain, int addr_width)
-{
-	struct pasid_entry *pte;
-	struct dma_pte *pgd;
-	int ret = 0;
-	u64 pgd_val;
-	int agaw;
-	u16 did;
-
-	if (!ecap_nest(iommu->ecap)) {
-		pr_err_ratelimited("IOMMU: %s: No nested translation support\n",
-				   iommu->name);
-		return -EINVAL;
-	}
-
-	if (!(domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
-		pr_err_ratelimited("Domain is not in nesting mode, %x\n",
-				   domain->flags);
-		return -EINVAL;
-	}
-
-	pte = intel_pasid_get_entry(dev, pasid);
-	if (WARN_ON(!pte))
-		return -EINVAL;
-
-	/*
-	 * Caller must ensure PASID entry is not in use, i.e. not bind the
-	 * same PASID to the same device twice.
-	 */
-	if (pasid_pte_is_present(pte))
-		return -EBUSY;
-
-	pasid_clear_entry(pte);
-
-	/* Sanity checking performed by caller to make sure address
-	 * width matching in two dimensions:
-	 * 1. CPU vs. IOMMU
-	 * 2. Guest vs. Host.
-	 */
-	switch (addr_width) {
-#ifdef CONFIG_X86
-	case ADDR_WIDTH_5LEVEL:
-		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
-		    !cap_5lp_support(iommu->cap)) {
-			dev_err_ratelimited(dev,
-					    "5-level paging not supported\n");
-			return -EINVAL;
-		}
-
-		pasid_set_flpm(pte, 1);
-		break;
-#endif
-	case ADDR_WIDTH_4LEVEL:
-		pasid_set_flpm(pte, 0);
-		break;
-	default:
-		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
-				    addr_width);
-		return -EINVAL;
-	}
-
-	/* First level PGD is in GPA, must be supported by the second level */
-	if ((uintptr_t)gpgd > domain->max_addr) {
-		dev_err_ratelimited(dev,
-				    "Guest PGD %lx not supported, max %llx\n",
-				    (uintptr_t)gpgd, domain->max_addr);
-		return -EINVAL;
-	}
-	pasid_set_flptr(pte, (uintptr_t)gpgd);
-
-	ret = intel_pasid_setup_bind_data(iommu, pte, pasid_data);
-	if (ret)
-		return ret;
-
-	/* Setup the second level based on the given domain */
-	pgd = domain->pgd;
-
-	agaw = iommu_skip_agaw(domain, iommu, &pgd);
-	if (agaw < 0) {
-		dev_err_ratelimited(dev, "Invalid domain page table\n");
-		return -EINVAL;
-	}
-	pgd_val = virt_to_phys(pgd);
-	pasid_set_slptr(pte, pgd_val);
-	pasid_set_fault_enable(pte);
-
-	did = domain->iommu_did[iommu->seq_id];
-	pasid_set_domain_id(pte, did);
-
-	pasid_set_address_width(pte, agaw);
-	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
-
-	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
-	pasid_set_present(pte);
-	pasid_flush_caches(iommu, pte, pasid, did);
-
-	return ret;
-}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..d04c83dd3a58 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -318,193 +318,6 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
 	return 0;
 }
 
-int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
-			  struct iommu_gpasid_bind_data *data)
-{
-	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-	struct intel_svm_dev *sdev = NULL;
-	struct dmar_domain *dmar_domain;
-	struct device_domain_info *info;
-	struct intel_svm *svm = NULL;
-	unsigned long iflags;
-	int ret = 0;
-
-	if (WARN_ON(!iommu) || !data)
-		return -EINVAL;
-
-	if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
-		return -EINVAL;
-
-	/* IOMMU core ensures argsz is more than the start of the union */
-	if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, vendor.vtd))
-		return -EINVAL;
-
-	/* Make sure no undefined flags are used in vendor data */
-	if (data->vendor.vtd.flags & ~(IOMMU_SVA_VTD_GPASID_LAST - 1))
-		return -EINVAL;
-
-	if (!dev_is_pci(dev))
-		return -ENOTSUPP;
-
-	/* VT-d supports devices with full 20 bit PASIDs only */
-	if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
-		return -EINVAL;
-
-	/*
-	 * We only check host PASID range, we have no knowledge to check
-	 * guest PASID range.
-	 */
-	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
-		return -EINVAL;
-
-	info = get_domain_info(dev);
-	if (!info)
-		return -EINVAL;
-
-	dmar_domain = to_dmar_domain(domain);
-
-	mutex_lock(&pasid_mutex);
-	ret = pasid_to_svm_sdev(dev, data->hpasid, &svm, &sdev);
-	if (ret)
-		goto out;
-
-	if (sdev) {
-		/*
-		 * Do not allow multiple bindings of the same device-PASID since
-		 * there is only one SL page tables per PASID. We may revisit
-		 * once sharing PGD across domains are supported.
-		 */
-		dev_warn_ratelimited(dev, "Already bound with PASID %u\n",
-				     svm->pasid);
-		ret = -EBUSY;
-		goto out;
-	}
-
-	if (!svm) {
-		/* We come here when PASID has never been bond to a device. */
-		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
-		if (!svm) {
-			ret = -ENOMEM;
-			goto out;
-		}
-		/* REVISIT: upper layer/VFIO can track host process that bind
-		 * the PASID. ioasid_set = mm might be sufficient for vfio to
-		 * check pasid VMM ownership. We can drop the following line
-		 * once VFIO and IOASID set check is in place.
-		 */
-		svm->mm = get_task_mm(current);
-		svm->pasid = data->hpasid;
-		if (data->flags & IOMMU_SVA_GPASID_VAL) {
-			svm->gpasid = data->gpasid;
-			svm->flags |= SVM_FLAG_GUEST_PASID;
-		}
-		pasid_private_add(data->hpasid, svm);
-		INIT_LIST_HEAD_RCU(&svm->devs);
-		mmput(svm->mm);
-	}
-	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
-	if (!sdev) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	sdev->dev = dev;
-	sdev->sid = PCI_DEVID(info->bus, info->devfn);
-	sdev->iommu = iommu;
-
-	/* Only count users if device has aux domains */
-	if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
-		sdev->users = 1;
-
-	/* Set up device context entry for PASID if not enabled already */
-	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
-	if (ret) {
-		dev_err_ratelimited(dev, "Failed to enable PASID capability\n");
-		kfree(sdev);
-		goto out;
-	}
-
-	/*
-	 * PASID table is per device for better security. Therefore, for
-	 * each bind of a new device even with an existing PASID, we need to
-	 * call the nested mode setup function here.
-	 */
-	spin_lock_irqsave(&iommu->lock, iflags);
-	ret = intel_pasid_setup_nested(iommu, dev,
-				       (pgd_t *)(uintptr_t)data->gpgd,
-				       data->hpasid, &data->vendor.vtd, dmar_domain,
-				       data->addr_width);
-	spin_unlock_irqrestore(&iommu->lock, iflags);
-	if (ret) {
-		dev_err_ratelimited(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
-				    data->hpasid, ret);
-		/*
-		 * PASID entry should be in cleared state if nested mode
-		 * set up failed. So we only need to clear IOASID tracking
-		 * data such that free call will succeed.
-		 */
-		kfree(sdev);
-		goto out;
-	}
-
-	svm->flags |= SVM_FLAG_GUEST_MODE;
-
-	init_rcu_head(&sdev->rcu);
-	list_add_rcu(&sdev->list, &svm->devs);
- out:
-	if (!IS_ERR_OR_NULL(svm) && list_empty(&svm->devs)) {
-		pasid_private_remove(data->hpasid);
-		kfree(svm);
-	}
-
-	mutex_unlock(&pasid_mutex);
-	return ret;
-}
-
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
-{
-	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-	struct intel_svm_dev *sdev;
-	struct intel_svm *svm;
-	int ret;
-
-	if (WARN_ON(!iommu))
-		return -EINVAL;
-
-	mutex_lock(&pasid_mutex);
-	ret = pasid_to_svm_sdev(dev, pasid, &svm, &sdev);
-	if (ret)
-		goto out;
-
-	if (sdev) {
-		if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
-			sdev->users--;
-		if (!sdev->users) {
-			list_del_rcu(&sdev->list);
-			intel_pasid_tear_down_entry(iommu, dev,
-						    svm->pasid, false);
-			intel_svm_drain_prq(dev, svm->pasid);
-			kfree_rcu(sdev, rcu);
-
-			if (list_empty(&svm->devs)) {
-				/*
-				 * We do not free the IOASID here in that
-				 * IOMMU driver did not allocate it.
-				 * Unlike native SVM, IOASID for guest use was
-				 * allocated prior to the bind call.
-				 * In any case, if the free call comes before
-				 * the unbind, IOMMU driver will get notified
-				 * and perform cleanup.
-				 */
-				pasid_private_remove(pasid);
-				kfree(svm);
-			}
-		}
-	}
-out:
-	mutex_unlock(&pasid_mutex);
-	return ret;
-}
-
 static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
 				 unsigned int flags)
 {
@@ -1125,28 +938,6 @@ int intel_svm_page_response(struct device *dev,
 		goto out;
 	}
 
-	/*
-	 * For responses from userspace, need to make sure that the
-	 * pasid has been bound to its mm.
-	 */
-	if (svm->flags & SVM_FLAG_GUEST_MODE) {
-		struct mm_struct *mm;
-
-		mm = get_task_mm(current);
-		if (!mm) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		if (mm != svm->mm) {
-			ret = -ENODEV;
-			mmput(mm);
-			goto out;
-		}
-
-		mmput(mm);
-	}
-
 	/*
 	 * Per VT-d spec. v3.0 ch7.7, system software must respond
 	 * with page group response if private data is present (PDP)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
  2022-01-24  7:10 ` [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks Lu Baolu
@ 2022-01-24  7:10 ` Lu Baolu
  2022-01-24  9:26   ` Christoph Hellwig
  2022-01-24  7:10 ` [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks Lu Baolu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:10 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The guest pasid related uapi interfaces and definitions are not referenced
anywhere in the tree. We've also reached a consensus to replace them with
a new iommufd design. Remove them to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h      |  44 --------
 include/uapi/linux/iommu.h | 181 --------------------------------
 drivers/iommu/iommu.c      | 210 -------------------------------------
 3 files changed, 435 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8e13c69980be..d1e9c3d73966 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -229,9 +229,6 @@ struct iommu_iotlb_gather {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
- * @cache_invalidate: invalidate translation caches
- * @sva_bind_gpasid: bind guest pasid and mm
- * @sva_unbind_gpasid: unbind guest pasid and mm
  * @def_domain_type: device default domain type, return value:
  *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
  *		- IOMMU_DOMAIN_DMA: must use a dma domain
@@ -301,12 +298,6 @@ struct iommu_ops {
 	int (*page_response)(struct device *dev,
 			     struct iommu_fault_event *evt,
 			     struct iommu_page_response *msg);
-	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
-				struct iommu_cache_invalidate_info *inv_info);
-	int (*sva_bind_gpasid)(struct iommu_domain *domain,
-			struct device *dev, struct iommu_gpasid_bind_data *data);
-
-	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
 
 	int (*def_domain_type)(struct device *dev);
 
@@ -422,14 +413,6 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
-extern int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
-				       struct device *dev,
-				       void __user *uinfo);
-
-extern int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
-				      struct device *dev, void __user *udata);
-extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
-					struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
 				   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
@@ -1052,33 +1035,6 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 	return IOMMU_PASID_INVALID;
 }
 
-static inline int
-iommu_uapi_cache_invalidate(struct iommu_domain *domain,
-			    struct device *dev,
-			    struct iommu_cache_invalidate_info *inv_info)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
-					     struct device *dev, void __user *udata)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
-					       struct device *dev, void __user *udata)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-					  struct device *dev,
-					  ioasid_t pasid)
-{
-	return -ENODEV;
-}
-
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 59178fc229ca..65d8b0234f69 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -158,185 +158,4 @@ struct iommu_page_response {
 	__u32	code;
 };
 
-/* defines the granularity of the invalidation */
-enum iommu_inv_granularity {
-	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
-	IOMMU_INV_GRANU_PASID,	/* PASID-selective invalidation */
-	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
-	IOMMU_INV_GRANU_NR,	/* number of invalidation granularities */
-};
-
-/**
- * struct iommu_inv_addr_info - Address Selective Invalidation Structure
- *
- * @flags: indicates the granularity of the address-selective invalidation
- * - If the PASID bit is set, the @pasid field is populated and the invalidation
- *   relates to cache entries tagged with this PASID and matching the address
- *   range.
- * - If ARCHID bit is set, @archid is populated and the invalidation relates
- *   to cache entries tagged with this architecture specific ID and matching
- *   the address range.
- * - Both PASID and ARCHID can be set as they may tag different caches.
- * - If neither PASID or ARCHID is set, global addr invalidation applies.
- * - The LEAF flag indicates whether only the leaf PTE caching needs to be
- *   invalidated and other paging structure caches can be preserved.
- * @pasid: process address space ID
- * @archid: architecture-specific ID
- * @addr: first stage/level input address
- * @granule_size: page/block size of the mapping in bytes
- * @nb_granules: number of contiguous granules to be invalidated
- */
-struct iommu_inv_addr_info {
-#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
-#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
-#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
-	__u32	flags;
-	__u32	archid;
-	__u64	pasid;
-	__u64	addr;
-	__u64	granule_size;
-	__u64	nb_granules;
-};
-
-/**
- * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
- *
- * @flags: indicates the granularity of the PASID-selective invalidation
- * - If the PASID bit is set, the @pasid field is populated and the invalidation
- *   relates to cache entries tagged with this PASID and matching the address
- *   range.
- * - If the ARCHID bit is set, the @archid is populated and the invalidation
- *   relates to cache entries tagged with this architecture specific ID and
- *   matching the address range.
- * - Both PASID and ARCHID can be set as they may tag different caches.
- * - At least one of PASID or ARCHID must be set.
- * @pasid: process address space ID
- * @archid: architecture-specific ID
- */
-struct iommu_inv_pasid_info {
-#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
-#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
-	__u32	flags;
-	__u32	archid;
-	__u64	pasid;
-};
-
-/**
- * struct iommu_cache_invalidate_info - First level/stage invalidation
- *     information
- * @argsz: User filled size of this data
- * @version: API version of this structure
- * @cache: bitfield that allows to select which caches to invalidate
- * @granularity: defines the lowest granularity used for the invalidation:
- *     domain > PASID > addr
- * @padding: reserved for future use (should be zero)
- * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
- * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR
- *
- * Not all the combinations of cache/granularity are valid:
- *
- * +--------------+---------------+---------------+---------------+
- * | type /       |   DEV_IOTLB   |     IOTLB     |      PASID    |
- * | granularity  |               |               |      cache    |
- * +==============+===============+===============+===============+
- * | DOMAIN       |       N/A     |       Y       |       Y       |
- * +--------------+---------------+---------------+---------------+
- * | PASID        |       Y       |       Y       |       Y       |
- * +--------------+---------------+---------------+---------------+
- * | ADDR         |       Y       |       Y       |       N/A     |
- * +--------------+---------------+---------------+---------------+
- *
- * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than
- * @version and @cache.
- *
- * If multiple cache types are invalidated simultaneously, they all
- * must support the used granularity.
- */
-struct iommu_cache_invalidate_info {
-	__u32	argsz;
-#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
-	__u32	version;
-/* IOMMU paging structure cache */
-#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
-#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
-#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
-#define IOMMU_CACHE_INV_TYPE_NR		(3)
-	__u8	cache;
-	__u8	granularity;
-	__u8	padding[6];
-	union {
-		struct iommu_inv_pasid_info pasid_info;
-		struct iommu_inv_addr_info addr_info;
-	} granu;
-};
-
-/**
- * struct iommu_gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
- * SVA binding.
- *
- * @flags:	VT-d PASID table entry attributes
- * @pat:	Page attribute table data to compute effective memory type
- * @emt:	Extended memory type
- *
- * Only guest vIOMMU selectable and effective options are passed down to
- * the host IOMMU.
- */
-struct iommu_gpasid_bind_data_vtd {
-#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
-#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
-#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
-#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
-#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
-#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
-#define IOMMU_SVA_VTD_GPASID_WPE	(1 << 6) /* Write protect enable */
-#define IOMMU_SVA_VTD_GPASID_LAST	(1 << 7)
-	__u64 flags;
-	__u32 pat;
-	__u32 emt;
-};
-
-#define IOMMU_SVA_VTD_GPASID_MTS_MASK	(IOMMU_SVA_VTD_GPASID_CD | \
-					 IOMMU_SVA_VTD_GPASID_EMTE | \
-					 IOMMU_SVA_VTD_GPASID_PCD |  \
-					 IOMMU_SVA_VTD_GPASID_PWT)
-
-/**
- * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
- * @argsz:	User filled size of this data
- * @version:	Version of this data structure
- * @format:	PASID table entry format
- * @flags:	Additional information on guest bind request
- * @gpgd:	Guest page directory base of the guest mm to bind
- * @hpasid:	Process address space ID used for the guest mm in host IOMMU
- * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
- * @addr_width:	Guest virtual address width
- * @padding:	Reserved for future use (should be zero)
- * @vtd:	Intel VT-d specific data
- *
- * Guest to host PASID mapping can be an identity or non-identity, where guest
- * has its own PASID space. For non-identify mapping, guest to host PASID lookup
- * is needed when VM programs guest PASID into an assigned device. VMM may
- * trap such PASID programming then request host IOMMU driver to convert guest
- * PASID to host PASID based on this bind data.
- */
-struct iommu_gpasid_bind_data {
-	__u32 argsz;
-#define IOMMU_GPASID_BIND_VERSION_1	1
-	__u32 version;
-#define IOMMU_PASID_FORMAT_INTEL_VTD	1
-#define IOMMU_PASID_FORMAT_LAST		2
-	__u32 format;
-	__u32 addr_width;
-#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
-	__u64 flags;
-	__u64 gpgd;
-	__u64 hpasid;
-	__u64 gpasid;
-	__u8  padding[8];
-	/* Vendor specific data */
-	union {
-		struct iommu_gpasid_bind_data_vtd vtd;
-	} vendor;
-};
-
 #endif /* _UAPI_IOMMU_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6103bcde1f65..5672beb061ea 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2028,216 +2028,6 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
 	return 0;
 }
 
-/*
- * Check flags and other user provided data for valid combinations. We also
- * make sure no reserved fields or unused flags are set. This is to ensure
- * not breaking userspace in the future when these fields or flags are used.
- */
-static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info *info)
-{
-	u32 mask;
-	int i;
-
-	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
-		return -EINVAL;
-
-	mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
-	if (info->cache & ~mask)
-		return -EINVAL;
-
-	if (info->granularity >= IOMMU_INV_GRANU_NR)
-		return -EINVAL;
-
-	switch (info->granularity) {
-	case IOMMU_INV_GRANU_ADDR:
-		if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
-			return -EINVAL;
-
-		mask = IOMMU_INV_ADDR_FLAGS_PASID |
-			IOMMU_INV_ADDR_FLAGS_ARCHID |
-			IOMMU_INV_ADDR_FLAGS_LEAF;
-
-		if (info->granu.addr_info.flags & ~mask)
-			return -EINVAL;
-		break;
-	case IOMMU_INV_GRANU_PASID:
-		mask = IOMMU_INV_PASID_FLAGS_PASID |
-			IOMMU_INV_PASID_FLAGS_ARCHID;
-		if (info->granu.pasid_info.flags & ~mask)
-			return -EINVAL;
-
-		break;
-	case IOMMU_INV_GRANU_DOMAIN:
-		if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
-			return -EINVAL;
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	/* Check reserved padding fields */
-	for (i = 0; i < sizeof(info->padding); i++) {
-		if (info->padding[i])
-			return -EINVAL;
-	}
-
-	return 0;
-}
-
-int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-				void __user *uinfo)
-{
-	struct iommu_cache_invalidate_info inv_info = { 0 };
-	u32 minsz;
-	int ret;
-
-	if (unlikely(!domain->ops->cache_invalidate))
-		return -ENODEV;
-
-	/*
-	 * No new spaces can be added before the variable sized union, the
-	 * minimum size is the offset to the union.
-	 */
-	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
-
-	/* Copy minsz from user to get flags and argsz */
-	if (copy_from_user(&inv_info, uinfo, minsz))
-		return -EFAULT;
-
-	/* Fields before the variable size union are mandatory */
-	if (inv_info.argsz < minsz)
-		return -EINVAL;
-
-	/* PASID and address granu require additional info beyond minsz */
-	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
-	    inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, granu.pasid_info))
-		return -EINVAL;
-
-	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
-	    inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, granu.addr_info))
-		return -EINVAL;
-
-	/*
-	 * User might be using a newer UAPI header which has a larger data
-	 * size, we shall support the existing flags within the current
-	 * size. Copy the remaining user data _after_ minsz but not more
-	 * than the current kernel supported size.
-	 */
-	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
-			   min_t(u32, inv_info.argsz, sizeof(inv_info)) - minsz))
-		return -EFAULT;
-
-	/* Now the argsz is validated, check the content */
-	ret = iommu_check_cache_invl_data(&inv_info);
-	if (ret)
-		return ret;
-
-	return domain->ops->cache_invalidate(domain, dev, &inv_info);
-}
-EXPORT_SYMBOL_GPL(iommu_uapi_cache_invalidate);
-
-static int iommu_check_bind_data(struct iommu_gpasid_bind_data *data)
-{
-	u64 mask;
-	int i;
-
-	if (data->version != IOMMU_GPASID_BIND_VERSION_1)
-		return -EINVAL;
-
-	/* Check the range of supported formats */
-	if (data->format >= IOMMU_PASID_FORMAT_LAST)
-		return -EINVAL;
-
-	/* Check all flags */
-	mask = IOMMU_SVA_GPASID_VAL;
-	if (data->flags & ~mask)
-		return -EINVAL;
-
-	/* Check reserved padding fields */
-	for (i = 0; i < sizeof(data->padding); i++) {
-		if (data->padding[i])
-			return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int iommu_sva_prepare_bind_data(void __user *udata,
-				       struct iommu_gpasid_bind_data *data)
-{
-	u32 minsz;
-
-	/*
-	 * No new spaces can be added before the variable sized union, the
-	 * minimum size is the offset to the union.
-	 */
-	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
-
-	/* Copy minsz from user to get flags and argsz */
-	if (copy_from_user(data, udata, minsz))
-		return -EFAULT;
-
-	/* Fields before the variable size union are mandatory */
-	if (data->argsz < minsz)
-		return -EINVAL;
-	/*
-	 * User might be using a newer UAPI header, we shall let IOMMU vendor
-	 * driver decide on what size it needs. Since the guest PASID bind data
-	 * can be vendor specific, larger argsz could be the result of extension
-	 * for one vendor but it should not affect another vendor.
-	 * Copy the remaining user data _after_ minsz
-	 */
-	if (copy_from_user((void *)data + minsz, udata + minsz,
-			   min_t(u32, data->argsz, sizeof(*data)) - minsz))
-		return -EFAULT;
-
-	return iommu_check_bind_data(data);
-}
-
-int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
-			       void __user *udata)
-{
-	struct iommu_gpasid_bind_data data = { 0 };
-	int ret;
-
-	if (unlikely(!domain->ops->sva_bind_gpasid))
-		return -ENODEV;
-
-	ret = iommu_sva_prepare_bind_data(udata, &data);
-	if (ret)
-		return ret;
-
-	return domain->ops->sva_bind_gpasid(domain, dev, &data);
-}
-EXPORT_SYMBOL_GPL(iommu_uapi_sva_bind_gpasid);
-
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-			     ioasid_t pasid)
-{
-	if (unlikely(!domain->ops->sva_unbind_gpasid))
-		return -ENODEV;
-
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
-
-int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-				 void __user *udata)
-{
-	struct iommu_gpasid_bind_data data = { 0 };
-	int ret;
-
-	if (unlikely(!domain->ops->sva_bind_gpasid))
-		return -ENODEV;
-
-	ret = iommu_sva_prepare_bind_data(udata, &data);
-	if (ret)
-		return ret;
-
-	return iommu_sva_unbind_gpasid(domain, dev, data.hpasid);
-}
-EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
-
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
  2022-01-24  7:10 ` [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks Lu Baolu
  2022-01-24  7:10 ` [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions Lu Baolu
@ 2022-01-24  7:10 ` Lu Baolu
  2022-01-24  9:26   ` Christoph Hellwig
  2022-01-24  7:10 ` [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops Lu Baolu
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:10 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The aux-domain related callbacks are not called in the tree. Remove them
to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/intel-iommu.h   |  17 --
 drivers/iommu/intel/debugfs.c |   3 +-
 drivers/iommu/intel/iommu.c   | 309 +---------------------------------
 3 files changed, 3 insertions(+), 326 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index beaacb589abd..5cfda90b2cca 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -542,7 +542,6 @@ struct dmar_domain {
 	u8 iommu_snooping: 1;		/* indicate snooping control feature */
 
 	struct list_head devices;	/* all devices' list */
-	struct list_head subdevices;	/* all subdevices' list */
 	struct iova_domain iovad;	/* iova's that belong to this domain */
 
 	struct dma_pte	*pgd;		/* virtual address */
@@ -557,11 +556,6 @@ struct dmar_domain {
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
 	u64		max_addr;	/* maximum mapped address */
 
-	u32		default_pasid;	/*
-					 * The default pasid used for non-SVM
-					 * traffic on mediated devices.
-					 */
-
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
 };
@@ -614,21 +608,11 @@ struct intel_iommu {
 	void *perf_statistic;
 };
 
-/* Per subdevice private data */
-struct subdev_domain_info {
-	struct list_head link_phys;	/* link to phys device siblings */
-	struct list_head link_domain;	/* link to domain siblings */
-	struct device *pdev;		/* physical device derived from */
-	struct dmar_domain *domain;	/* aux-domain */
-	int users;			/* user count */
-};
-
 /* PCI domain-device relationship */
 struct device_domain_info {
 	struct list_head link;	/* link to domain siblings */
 	struct list_head global; /* link to global list */
 	struct list_head table;	/* link to pasid table */
-	struct list_head subdevices; /* subdevices sibling */
 	u32 segment;		/* PCI segment number */
 	u8 bus;			/* PCI bus number */
 	u8 devfn;		/* PCI devfn number */
@@ -639,7 +623,6 @@ struct device_domain_info {
 	u8 pri_enabled:1;
 	u8 ats_supported:1;
 	u8 ats_enabled:1;
-	u8 auxd_enabled:1;	/* Multiple domains per device */
 	u8 ats_qdep;
 	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
 	struct intel_iommu *iommu; /* IOMMU used by this device */
diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index 62e23ff3c987..db7a0ca73626 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -351,8 +351,7 @@ static int show_device_domain_translation(struct device *dev, void *data)
 	if (!domain)
 		return 0;
 
-	seq_printf(m, "Device %s with pasid %d @0x%llx\n",
-		   dev_name(dev), domain->default_pasid,
+	seq_printf(m, "Device %s @0x%llx\n", dev_name(dev),
 		   (u64)virt_to_phys(domain->pgd));
 	seq_puts(m, "IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n");
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4f9d95067c8f..2b5f4e57a8bb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1573,18 +1573,6 @@ static void domain_update_iotlb(struct dmar_domain *domain)
 			break;
 		}
 
-	if (!has_iotlb_device) {
-		struct subdev_domain_info *sinfo;
-
-		list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
-			info = get_domain_info(sinfo->pdev);
-			if (info && info->ats_enabled) {
-				has_iotlb_device = true;
-				break;
-			}
-		}
-	}
-
 	domain->has_iotlb_device = has_iotlb_device;
 }
 
@@ -1682,7 +1670,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 {
 	unsigned long flags;
 	struct device_domain_info *info;
-	struct subdev_domain_info *sinfo;
 
 	if (!domain->has_iotlb_device)
 		return;
@@ -1691,27 +1678,9 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 	list_for_each_entry(info, &domain->devices, link)
 		__iommu_flush_dev_iotlb(info, addr, mask);
 
-	list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
-		info = get_domain_info(sinfo->pdev);
-		__iommu_flush_dev_iotlb(info, addr, mask);
-	}
 	spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
-static void domain_flush_piotlb(struct intel_iommu *iommu,
-				struct dmar_domain *domain,
-				u64 addr, unsigned long npages, bool ih)
-{
-	u16 did = domain->iommu_did[iommu->seq_id];
-
-	if (domain->default_pasid)
-		qi_flush_piotlb(iommu, did, domain->default_pasid,
-				addr, npages, ih);
-
-	if (!list_empty(&domain->devices))
-		qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
-}
-
 static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 				  struct dmar_domain *domain,
 				  unsigned long pfn, unsigned int pages,
@@ -1727,7 +1696,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 		ih = 1 << 6;
 
 	if (domain_use_first_level(domain)) {
-		domain_flush_piotlb(iommu, domain, addr, pages, ih);
+		qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
 	} else {
 		/*
 		 * Fallback to domain selective flush if no PSI support or
@@ -1776,7 +1745,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
 		u16 did = dmar_domain->iommu_did[iommu->seq_id];
 
 		if (domain_use_first_level(dmar_domain))
-			domain_flush_piotlb(iommu, dmar_domain, 0, -1, 0);
+			qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
 		else
 			iommu->flush.flush_iotlb(iommu, did, 0, 0,
 						 DMA_TLB_DSI_FLUSH);
@@ -1983,7 +1952,6 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
 	domain->has_iotlb_device = false;
 	INIT_LIST_HEAD(&domain->devices);
-	INIT_LIST_HEAD(&domain->subdevices);
 
 	return domain;
 }
@@ -2676,8 +2644,6 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 	info->domain = domain;
 	info->iommu = iommu;
 	info->pasid_table = NULL;
-	info->auxd_enabled = 0;
-	INIT_LIST_HEAD(&info->subdevices);
 
 	if (dev && dev_is_pci(dev)) {
 		struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -4637,183 +4603,6 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 		domain_exit(to_dmar_domain(domain));
 }
 
-/*
- * Check whether a @domain could be attached to the @dev through the
- * aux-domain attach/detach APIs.
- */
-static inline bool
-is_aux_domain(struct device *dev, struct iommu_domain *domain)
-{
-	struct device_domain_info *info = get_domain_info(dev);
-
-	return info && info->auxd_enabled &&
-			domain->type == IOMMU_DOMAIN_UNMANAGED;
-}
-
-static inline struct subdev_domain_info *
-lookup_subdev_info(struct dmar_domain *domain, struct device *dev)
-{
-	struct subdev_domain_info *sinfo;
-
-	if (!list_empty(&domain->subdevices)) {
-		list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
-			if (sinfo->pdev == dev)
-				return sinfo;
-		}
-	}
-
-	return NULL;
-}
-
-static int auxiliary_link_device(struct dmar_domain *domain,
-				 struct device *dev)
-{
-	struct device_domain_info *info = get_domain_info(dev);
-	struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
-
-	assert_spin_locked(&device_domain_lock);
-	if (WARN_ON(!info))
-		return -EINVAL;
-
-	if (!sinfo) {
-		sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
-		if (!sinfo)
-			return -ENOMEM;
-		sinfo->domain = domain;
-		sinfo->pdev = dev;
-		list_add(&sinfo->link_phys, &info->subdevices);
-		list_add(&sinfo->link_domain, &domain->subdevices);
-	}
-
-	return ++sinfo->users;
-}
-
-static int auxiliary_unlink_device(struct dmar_domain *domain,
-				   struct device *dev)
-{
-	struct device_domain_info *info = get_domain_info(dev);
-	struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
-	int ret;
-
-	assert_spin_locked(&device_domain_lock);
-	if (WARN_ON(!info || !sinfo || sinfo->users <= 0))
-		return -EINVAL;
-
-	ret = --sinfo->users;
-	if (!ret) {
-		list_del(&sinfo->link_phys);
-		list_del(&sinfo->link_domain);
-		kfree(sinfo);
-	}
-
-	return ret;
-}
-
-static int aux_domain_add_dev(struct dmar_domain *domain,
-			      struct device *dev)
-{
-	int ret;
-	unsigned long flags;
-	struct intel_iommu *iommu;
-
-	iommu = device_to_iommu(dev, NULL, NULL);
-	if (!iommu)
-		return -ENODEV;
-
-	if (domain->default_pasid <= 0) {
-		u32 pasid;
-
-		/* No private data needed for the default pasid */
-		pasid = ioasid_alloc(NULL, PASID_MIN,
-				     pci_max_pasids(to_pci_dev(dev)) - 1,
-				     NULL);
-		if (pasid == INVALID_IOASID) {
-			pr_err("Can't allocate default pasid\n");
-			return -ENODEV;
-		}
-		domain->default_pasid = pasid;
-	}
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	ret = auxiliary_link_device(domain, dev);
-	if (ret <= 0)
-		goto link_failed;
-
-	/*
-	 * Subdevices from the same physical device can be attached to the
-	 * same domain. For such cases, only the first subdevice attachment
-	 * needs to go through the full steps in this function. So if ret >
-	 * 1, just goto out.
-	 */
-	if (ret > 1)
-		goto out;
-
-	/*
-	 * iommu->lock must be held to attach domain to iommu and setup the
-	 * pasid entry for second level translation.
-	 */
-	spin_lock(&iommu->lock);
-	ret = domain_attach_iommu(domain, iommu);
-	if (ret)
-		goto attach_failed;
-
-	/* Setup the PASID entry for mediated devices: */
-	if (domain_use_first_level(domain))
-		ret = domain_setup_first_level(iommu, domain, dev,
-					       domain->default_pasid);
-	else
-		ret = intel_pasid_setup_second_level(iommu, domain, dev,
-						     domain->default_pasid);
-	if (ret)
-		goto table_failed;
-
-	spin_unlock(&iommu->lock);
-out:
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return 0;
-
-table_failed:
-	domain_detach_iommu(domain, iommu);
-attach_failed:
-	spin_unlock(&iommu->lock);
-	auxiliary_unlink_device(domain, dev);
-link_failed:
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-	if (list_empty(&domain->subdevices) && domain->default_pasid > 0)
-		ioasid_put(domain->default_pasid);
-
-	return ret;
-}
-
-static void aux_domain_remove_dev(struct dmar_domain *domain,
-				  struct device *dev)
-{
-	struct device_domain_info *info;
-	struct intel_iommu *iommu;
-	unsigned long flags;
-
-	if (!is_aux_domain(dev, &domain->domain))
-		return;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	info = get_domain_info(dev);
-	iommu = info->iommu;
-
-	if (!auxiliary_unlink_device(domain, dev)) {
-		spin_lock(&iommu->lock);
-		intel_pasid_tear_down_entry(iommu, dev,
-					    domain->default_pasid, false);
-		domain_detach_iommu(domain, iommu);
-		spin_unlock(&iommu->lock);
-	}
-
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	if (list_empty(&domain->subdevices) && domain->default_pasid > 0)
-		ioasid_put(domain->default_pasid);
-}
-
 static int prepare_domain_attach_device(struct iommu_domain *domain,
 					struct device *dev)
 {
@@ -4866,9 +4655,6 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 		return -EPERM;
 	}
 
-	if (is_aux_domain(dev, domain))
-		return -EPERM;
-
 	/* normally dev is not mapped */
 	if (unlikely(domain_context_mapped(dev))) {
 		struct dmar_domain *old_domain;
@@ -4885,33 +4671,12 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
 	return domain_add_dev_info(to_dmar_domain(domain), dev);
 }
 
-static int intel_iommu_aux_attach_device(struct iommu_domain *domain,
-					 struct device *dev)
-{
-	int ret;
-
-	if (!is_aux_domain(dev, domain))
-		return -EPERM;
-
-	ret = prepare_domain_attach_device(domain, dev);
-	if (ret)
-		return ret;
-
-	return aux_domain_add_dev(to_dmar_domain(domain), dev);
-}
-
 static void intel_iommu_detach_device(struct iommu_domain *domain,
 				      struct device *dev)
 {
 	dmar_remove_one_dev_info(dev);
 }
 
-static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
-					  struct device *dev)
-{
-	aux_domain_remove_dev(to_dmar_domain(domain), dev);
-}
-
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot, gfp_t gfp)
@@ -5205,46 +4970,6 @@ static struct iommu_group *intel_iommu_device_group(struct device *dev)
 	return generic_device_group(dev);
 }
 
-static int intel_iommu_enable_auxd(struct device *dev)
-{
-	struct device_domain_info *info;
-	struct intel_iommu *iommu;
-	unsigned long flags;
-	int ret;
-
-	iommu = device_to_iommu(dev, NULL, NULL);
-	if (!iommu || dmar_disabled)
-		return -EINVAL;
-
-	if (!sm_supported(iommu) || !pasid_supported(iommu))
-		return -EINVAL;
-
-	ret = intel_iommu_enable_pasid(iommu, dev);
-	if (ret)
-		return -ENODEV;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	info = get_domain_info(dev);
-	info->auxd_enabled = 1;
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return 0;
-}
-
-static int intel_iommu_disable_auxd(struct device *dev)
-{
-	struct device_domain_info *info;
-	unsigned long flags;
-
-	spin_lock_irqsave(&device_domain_lock, flags);
-	info = get_domain_info(dev);
-	if (!WARN_ON(!info))
-		info->auxd_enabled = 0;
-	spin_unlock_irqrestore(&device_domain_lock, flags);
-
-	return 0;
-}
-
 static int intel_iommu_enable_sva(struct device *dev)
 {
 	struct device_domain_info *info = get_domain_info(dev);
@@ -5301,9 +5026,6 @@ static int
 intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
 {
 	switch (feat) {
-	case IOMMU_DEV_FEAT_AUX:
-		return intel_iommu_enable_auxd(dev);
-
 	case IOMMU_DEV_FEAT_IOPF:
 		return intel_iommu_enable_iopf(dev);
 
@@ -5319,9 +5041,6 @@ static int
 intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
 {
 	switch (feat) {
-	case IOMMU_DEV_FEAT_AUX:
-		return intel_iommu_disable_auxd(dev);
-
 	case IOMMU_DEV_FEAT_IOPF:
 		return 0;
 
@@ -5333,26 +5052,6 @@ intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
 	}
 }
 
-static bool
-intel_iommu_dev_feat_enabled(struct device *dev, enum iommu_dev_features feat)
-{
-	struct device_domain_info *info = get_domain_info(dev);
-
-	if (feat == IOMMU_DEV_FEAT_AUX)
-		return scalable_mode_support() && info && info->auxd_enabled;
-
-	return false;
-}
-
-static int
-intel_iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-
-	return dmar_domain->default_pasid > 0 ?
-			dmar_domain->default_pasid : -EINVAL;
-}
-
 static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
 					   struct device *dev)
 {
@@ -5397,9 +5096,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.domain_free		= intel_iommu_domain_free,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
-	.aux_attach_dev		= intel_iommu_aux_attach_device,
-	.aux_detach_dev		= intel_iommu_aux_detach_device,
-	.aux_get_pasid		= intel_iommu_aux_get_pasid,
 	.map_pages		= intel_iommu_map_pages,
 	.unmap_pages		= intel_iommu_unmap_pages,
 	.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
@@ -5412,7 +5108,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.get_resv_regions	= intel_iommu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
 	.device_group		= intel_iommu_device_group,
-	.dev_feat_enabled	= intel_iommu_dev_feat_enabled,
 	.dev_enable_feat	= intel_iommu_dev_enable_feat,
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.is_attach_deferred	= intel_iommu_is_attach_deferred,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (2 preceding siblings ...)
  2022-01-24  7:10 ` [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks Lu Baolu
@ 2022-01-24  7:10 ` Lu Baolu
  2022-01-24  9:27   ` Christoph Hellwig
  2022-01-24  7:11 ` [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain Lu Baolu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:10 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The aux-domain related interfaces and iommu_ops are not referenced
anywhere in the tree. We've also reached a consensus to redesign it
based the new iommufd framework. Remove them to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 29 ---------------------------
 drivers/iommu/iommu.c | 46 -------------------------------------------
 2 files changed, 75 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d1e9c3d73966..aa5486243892 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -144,7 +144,6 @@ struct iommu_resv_region {
 
 /**
  * enum iommu_dev_features - Per device IOMMU features
- * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
  * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
  * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally
  *			 enabling %IOMMU_DEV_FEAT_SVA requires
@@ -157,7 +156,6 @@ struct iommu_resv_region {
  * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
  */
 enum iommu_dev_features {
-	IOMMU_DEV_FEAT_AUX,
 	IOMMU_DEV_FEAT_SVA,
 	IOMMU_DEV_FEAT_IOPF,
 };
@@ -223,8 +221,6 @@ struct iommu_iotlb_gather {
  * @dev_has/enable/disable_feat: per device entries to check/enable/disable
  *                               iommu specific features.
  * @dev_feat_enabled: check enabled feature
- * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
- * @aux_get_pasid: get the pasid given an aux-domain
  * @sva_bind: Bind process address space to device
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
@@ -285,11 +281,6 @@ struct iommu_ops {
 	int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
 	int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
 
-	/* Aux-domain specific attach/detach entries */
-	int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
-	void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
-	int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
-
 	struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
@@ -656,9 +647,6 @@ void iommu_release_device(struct device *dev);
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
 bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
-int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev);
-void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
 
 struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 					struct mm_struct *mm,
@@ -1003,23 +991,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 	return -ENODEV;
 }
 
-static inline int
-iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline void
-iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
-{
-}
-
-static inline int
-iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-	return -ENODEV;
-}
-
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5672beb061ea..5230c6d90ece 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2746,8 +2746,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
 
 /*
  * The device drivers should do the necessary cleanups before calling this.
- * For example, before disabling the aux-domain feature, the device driver
- * should detach all aux-domains. Otherwise, this will return -EBUSY.
  */
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 {
@@ -2775,50 +2773,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
 }
 EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
 
-/*
- * Aux-domain specific attach/detach.
- *
- * Only works if iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) returns
- * true. Also, as long as domains are attached to a device through this
- * interface, any tries to call iommu_attach_device() should fail
- * (iommu_detach_device() can't fail, so we fail when trying to re-attach).
- * This should make us safe against a device being attached to a guest as a
- * whole while there are still pasid users on it (aux and sva).
- */
-int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
-{
-	int ret = -ENODEV;
-
-	if (domain->ops->aux_attach_dev)
-		ret = domain->ops->aux_attach_dev(domain, dev);
-
-	if (!ret)
-		trace_attach_device_to_domain(dev);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_aux_attach_device);
-
-void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
-{
-	if (domain->ops->aux_detach_dev) {
-		domain->ops->aux_detach_dev(domain, dev);
-		trace_detach_device_from_domain(dev);
-	}
-}
-EXPORT_SYMBOL_GPL(iommu_aux_detach_device);
-
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-	int ret = -ENODEV;
-
-	if (domain->ops->aux_get_pasid)
-		ret = domain->ops->aux_get_pasid(domain, dev);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
-
 /**
  * iommu_sva_bind_device() - Bind a process address space to a device
  * @dev: the device
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (3 preceding siblings ...)
  2022-01-24  7:10 ` [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops Lu Baolu
@ 2022-01-24  7:11 ` Lu Baolu
  2022-01-24  9:29   ` Christoph Hellwig
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:11 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
field. Retrieve the value from the right place.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211218074546.1772553-1-baolu.lu@linux.intel.com
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index d0d52c1d4aee..992cc285f2fe 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -133,7 +133,7 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 		 * or equal to the system's PAGE_SIZE, with a preference if
 		 * both are equal.
 		 */
-		pgsize_bitmap = tdev->iommu.domain->ops->pgsize_bitmap;
+		pgsize_bitmap = tdev->iommu.domain->pgsize_bitmap;
 		if (pgsize_bitmap & PAGE_SIZE) {
 			tdev->iommu.pgshift = PAGE_SHIFT;
 		} else {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (4 preceding siblings ...)
  2022-01-24  7:11 ` [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain Lu Baolu
@ 2022-01-24  7:11 ` Lu Baolu
  2022-01-24  9:32   ` Christoph Hellwig
                     ` (3 more replies)
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
                   ` (2 subsequent siblings)
  8 siblings, 4 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:11 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

The common iommu_ops is hooked to both device and domain. When a helper
has both device and domain pointer, the way to get the iommu_ops looks
messy in iommu core. This sorts out the way to get iommu_ops. The device
related helpers go through device pointer, while the domain related ones
go through domain pointer.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h |  8 ++++++++
 drivers/iommu/iommu.c | 25 ++++++++++++++-----------
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aa5486243892..111b3e9c79bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 	};
 }
 
+static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
+{
+	if (dev && dev->iommu && dev->iommu->iommu_dev)
+		return dev->iommu->iommu_dev->ops;
+
+	return NULL;
+}
+
 #define IOMMU_BUS_NOTIFY_PRIORITY		0
 #define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
 #define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5230c6d90ece..6631e2ea44df 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
 					       struct device *dev)
 {
+	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
 	struct iommu_domain *domain = group->default_domain;
 	struct iommu_resv_region *entry;
 	struct list_head mappings;
@@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
 		dma_addr_t start, end, addr;
 		size_t map_size = 0;
 
-		if (domain->ops->apply_resv_region)
-			domain->ops->apply_resv_region(dev, domain, entry);
+		if (ops->apply_resv_region)
+			ops->apply_resv_region(dev, domain, entry);
 
 		start = ALIGN(entry->start, pg_size);
 		end   = ALIGN(entry->start + entry->length, pg_size);
@@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
 static bool iommu_is_attach_deferred(struct iommu_domain *domain,
 				     struct device *dev)
 {
-	if (domain->ops->is_attach_deferred)
-		return domain->ops->is_attach_deferred(domain, dev);
+	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
+
+	if (ops->is_attach_deferred)
+		return ops->is_attach_deferred(domain, dev);
 
 	return false;
 }
@@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
 	struct iommu_fault_event *evt;
 	struct iommu_fault_page_request *prm;
 	struct dev_iommu *param = dev->iommu;
+	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
 	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 
-	if (!domain || !domain->ops->page_response)
+	if (!ops || !ops->page_response)
 		return -ENODEV;
 
 	if (!param || !param->fault_param)
@@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
 			msg->pasid = 0;
 		}
 
-		ret = domain->ops->page_response(dev, evt, msg);
+		ret = ops->page_response(dev, evt, msg);
 		list_del(&evt->list);
 		kfree(evt);
 		break;
@@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
 
 static int iommu_group_do_probe_finalize(struct device *dev, void *data)
 {
-	struct iommu_domain *domain = data;
+	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
 
-	if (domain->ops->probe_finalize)
-		domain->ops->probe_finalize(dev);
+	if (ops->probe_finalize)
+		ops->probe_finalize(dev);
 
 	return 0;
 }
@@ -2020,7 +2023,7 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
 
 int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
 
 	if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))
 		return __iommu_attach_device(domain, dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (5 preceding siblings ...)
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
@ 2022-01-24  7:11 ` Lu Baolu
  2022-01-24  9:37   ` Christoph Hellwig
                     ` (3 more replies)
  2022-01-24  9:46 ` [PATCH 0/7] iommu cleanup and refactoring Tian, Kevin
  2022-02-08  1:32 ` Lu Baolu
  8 siblings, 4 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-24  7:11 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel, Lu Baolu

Add a domain specific callback set, domain_ops, for vendor iommu driver
to provide domain specific operations. Move domain-specific callbacks
from iommu_ops to the domain_ops and hook them when a domain is allocated.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h                       | 93 ++++++++++++---------
 drivers/iommu/amd/iommu.c                   | 21 +++--
 drivers/iommu/apple-dart.c                  | 24 ++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 23 +++--
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     | 17 ++--
 drivers/iommu/exynos-iommu.c                | 17 ++--
 drivers/iommu/fsl_pamu_domain.c             | 13 ++-
 drivers/iommu/intel/iommu.c                 | 25 ++++--
 drivers/iommu/iommu.c                       | 15 ++--
 drivers/iommu/ipmmu-vmsa.c                  | 21 +++--
 drivers/iommu/msm_iommu.c                   | 17 ++--
 drivers/iommu/mtk_iommu.c                   | 24 ++++--
 drivers/iommu/mtk_iommu_v1.c                | 19 +++--
 drivers/iommu/omap-iommu.c                  | 15 ++--
 drivers/iommu/rockchip-iommu.c              | 17 ++--
 drivers/iommu/s390-iommu.c                  | 15 ++--
 drivers/iommu/sprd-iommu.c                  | 19 +++--
 drivers/iommu/sun50i-iommu.c                | 18 ++--
 drivers/iommu/tegra-gart.c                  | 15 ++--
 drivers/iommu/tegra-smmu.c                  | 16 ++--
 drivers/iommu/virtio-iommu.c                | 18 ++--
 22 files changed, 305 insertions(+), 179 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 111b3e9c79bb..33c5c0e5c465 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -88,7 +88,7 @@ struct iommu_domain_geometry {
 
 struct iommu_domain {
 	unsigned type;
-	const struct iommu_ops *ops;
+	const struct domain_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
 	void *handler_token;
@@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
  * @domain_alloc: allocate iommu domain
- * @domain_free: free iommu domain
- * @attach_dev: attach device to an iommu domain
- * @detach_dev: detach device from an iommu domain
- * @map: map a physically contiguous memory region to an iommu domain
- * @map_pages: map a physically contiguous set of pages of the same size to
- *             an iommu domain.
- * @unmap: unmap a physically contiguous memory region from an iommu domain
- * @unmap_pages: unmap a number of pages of the same size from an iommu domain
- * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
- * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
- * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
- *            queue
- * @iova_to_phys: translate iova to physical address
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
  *                  group and attached to the groups domain
  * @device_group: find iommu group for a particular device
- * @enable_nesting: Enable nesting
- * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
  * @get_resv_regions: Request list of reserved regions for a device
  * @put_resv_regions: Free list of reserved regions for a device
  * @apply_resv_region: Temporary helper call-back for iova reserved ranges
@@ -237,33 +222,11 @@ struct iommu_ops {
 
 	/* Domain allocation and freeing by the iommu driver */
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
-	void (*domain_free)(struct iommu_domain *);
 
-	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
-	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
-	int (*map)(struct iommu_domain *domain, unsigned long iova,
-		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
-	int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
-			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
-			 int prot, gfp_t gfp, size_t *mapped);
-	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
-	size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
-			      size_t pgsize, size_t pgcount,
-			      struct iommu_iotlb_gather *iotlb_gather);
-	void (*flush_iotlb_all)(struct iommu_domain *domain);
-	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
-			       size_t size);
-	void (*iotlb_sync)(struct iommu_domain *domain,
-			   struct iommu_iotlb_gather *iotlb_gather);
-	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
 	void (*probe_finalize)(struct device *dev);
 	struct iommu_group *(*device_group)(struct device *dev);
-	int (*enable_nesting)(struct iommu_domain *domain);
-	int (*set_pgtable_quirks)(struct iommu_domain *domain,
-				  unsigned long quirks);
 
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
@@ -296,6 +259,60 @@ struct iommu_ops {
 	struct module *owner;
 };
 
+/**
+ * struct domain_ops - per-domain ops
+ * @attach_dev: attach an iommu domain to a device
+ * @detach_dev: detach an iommu domain from a device
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @map_pages: map a physically contiguous set of pages of the same size to
+ *             an iommu domain.
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an iommu domain
+ * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
+ * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
+ * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
+ *            queue
+ * @iova_to_phys: translate iova to physical address
+ * @enable_nesting: Enable nesting
+ * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
+ * @release: Release the domain after use.
+ */
+struct domain_ops {
+	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
+	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+
+	int (*map)(struct iommu_domain *domain, unsigned long iova,
+		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+	int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
+			 int prot, gfp_t gfp, size_t *mapped);
+	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+			size_t size, struct iommu_iotlb_gather *iotlb_gather);
+	size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
+			      size_t pgsize, size_t pgcount,
+			      struct iommu_iotlb_gather *iotlb_gather);
+
+	void (*flush_iotlb_all)(struct iommu_domain *domain);
+	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+			       size_t size);
+	void (*iotlb_sync)(struct iommu_domain *domain,
+			   struct iommu_iotlb_gather *iotlb_gather);
+
+	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+
+	int (*enable_nesting)(struct iommu_domain *domain);
+	int (*set_pgtable_quirks)(struct iommu_domain *domain,
+				  unsigned long quirks);
+
+	void (*release)(struct iommu_domain *domain);
+};
+
+static inline void iommu_domain_set_ops(struct iommu_domain *domain,
+					const struct domain_ops *ops)
+{
+	domain->ops = ops;
+}
+
 /**
  * struct iommu_device - IOMMU core representation of one IOMMU hardware
  *			 instance
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 461f1844ed1f..1d25dc2e0de9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -74,6 +74,7 @@ LIST_HEAD(acpihid_map);
  * if iommu=pt passed on kernel cmd line.
  */
 const struct iommu_ops amd_iommu_ops;
+static const struct domain_ops amd_domain_ops;
 
 static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
 int amd_iommu_max_glx_val = -1;
@@ -1977,6 +1978,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
 	domain->domain.geometry.aperture_start = 0;
 	domain->domain.geometry.aperture_end   = ~0ULL;
 	domain->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&domain->domain, &amd_domain_ops);
 
 	return &domain->domain;
 }
@@ -2269,13 +2271,6 @@ static int amd_iommu_def_domain_type(struct device *dev)
 const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.domain_alloc = amd_iommu_domain_alloc,
-	.domain_free  = amd_iommu_domain_free,
-	.attach_dev = amd_iommu_attach_device,
-	.detach_dev = amd_iommu_detach_device,
-	.map = amd_iommu_map,
-	.iotlb_sync_map	= amd_iommu_iotlb_sync_map,
-	.unmap = amd_iommu_unmap,
-	.iova_to_phys = amd_iommu_iova_to_phys,
 	.probe_device = amd_iommu_probe_device,
 	.release_device = amd_iommu_release_device,
 	.probe_finalize = amd_iommu_probe_finalize,
@@ -2284,9 +2279,19 @@ const struct iommu_ops amd_iommu_ops = {
 	.put_resv_regions = generic_iommu_put_resv_regions,
 	.is_attach_deferred = amd_iommu_is_attach_deferred,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
+	.def_domain_type = amd_iommu_def_domain_type,
+};
+
+static const struct domain_ops amd_domain_ops = {
+	.attach_dev = amd_iommu_attach_device,
+	.detach_dev = amd_iommu_detach_device,
+	.map = amd_iommu_map,
+	.iotlb_sync_map	= amd_iommu_iotlb_sync_map,
+	.unmap = amd_iommu_unmap,
+	.iova_to_phys = amd_iommu_iova_to_phys,
 	.flush_iotlb_all = amd_iommu_flush_iotlb_all,
 	.iotlb_sync = amd_iommu_iotlb_sync,
-	.def_domain_type = amd_iommu_def_domain_type,
+	.release = amd_iommu_domain_free,
 };
 
 /*****************************************************************************
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 565ef5598811..bf40970ec977 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -185,6 +185,7 @@ struct apple_dart_master_cfg {
 
 static struct platform_driver apple_dart_driver;
 static const struct iommu_ops apple_dart_iommu_ops;
+static const struct domain_ops apple_dart_domain_ops;
 
 static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom)
 {
@@ -589,6 +590,8 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
 	if (type == IOMMU_DOMAIN_IDENTITY || type == IOMMU_DOMAIN_BLOCKED)
 		dart_domain->finalized = true;
 
+	iommu_domain_set_ops(&dart_domain->domain, &apple_dart_domain_ops);
+
 	return &dart_domain->domain;
 }
 
@@ -765,15 +768,6 @@ static void apple_dart_get_resv_regions(struct device *dev,
 
 static const struct iommu_ops apple_dart_iommu_ops = {
 	.domain_alloc = apple_dart_domain_alloc,
-	.domain_free = apple_dart_domain_free,
-	.attach_dev = apple_dart_attach_dev,
-	.detach_dev = apple_dart_detach_dev,
-	.map_pages = apple_dart_map_pages,
-	.unmap_pages = apple_dart_unmap_pages,
-	.flush_iotlb_all = apple_dart_flush_iotlb_all,
-	.iotlb_sync = apple_dart_iotlb_sync,
-	.iotlb_sync_map = apple_dart_iotlb_sync_map,
-	.iova_to_phys = apple_dart_iova_to_phys,
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
@@ -784,6 +778,18 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */
 };
 
+static const struct domain_ops apple_dart_domain_ops = {
+	.attach_dev = apple_dart_attach_dev,
+	.detach_dev = apple_dart_detach_dev,
+	.map_pages = apple_dart_map_pages,
+	.unmap_pages = apple_dart_unmap_pages,
+	.flush_iotlb_all = apple_dart_flush_iotlb_all,
+	.iotlb_sync = apple_dart_iotlb_sync,
+	.iotlb_sync_map = apple_dart_iotlb_sync_map,
+	.iova_to_phys = apple_dart_iova_to_phys,
+	.release = apple_dart_domain_free,
+};
+
 static irqreturn_t apple_dart_irq(int irq, void *dev)
 {
 	struct apple_dart *dart = dev;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6dc6d8b6b368..4461f7d35625 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1981,6 +1981,8 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
 	.tlb_add_page	= arm_smmu_tlb_inv_page_nosync,
 };
 
+static const struct domain_ops arm_smmu_domain_ops;
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -2017,6 +2019,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	INIT_LIST_HEAD(&smmu_domain->devices);
 	spin_lock_init(&smmu_domain->devices_lock);
 	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+	iommu_domain_set_ops(&smmu_domain->domain, &arm_smmu_domain_ops);
 
 	return &smmu_domain->domain;
 }
@@ -2841,17 +2844,9 @@ static int arm_smmu_dev_disable_feature(struct device *dev,
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
-	.domain_free		= arm_smmu_domain_free,
-	.attach_dev		= arm_smmu_attach_dev,
-	.map_pages		= arm_smmu_map_pages,
-	.unmap_pages		= arm_smmu_unmap_pages,
-	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
-	.iotlb_sync		= arm_smmu_iotlb_sync,
-	.iova_to_phys		= arm_smmu_iova_to_phys,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-	.enable_nesting		= arm_smmu_enable_nesting,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
@@ -2867,6 +2862,17 @@ static struct iommu_ops arm_smmu_ops = {
 	.owner			= THIS_MODULE,
 };
 
+static const struct domain_ops arm_smmu_domain_ops = {
+	.attach_dev		= arm_smmu_attach_dev,
+	.map_pages		= arm_smmu_map_pages,
+	.unmap_pages		= arm_smmu_unmap_pages,
+	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
+	.iotlb_sync		= arm_smmu_iotlb_sync,
+	.iova_to_phys		= arm_smmu_iova_to_phys,
+	.enable_nesting		= arm_smmu_enable_nesting,
+	.release		= arm_smmu_domain_free,
+};
+
 /* Probing and initialisation functions */
 static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 				   struct arm_smmu_queue *q,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4bc75c4ce402..b59c7a0df781 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -91,6 +91,7 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 
 static struct platform_driver arm_smmu_driver;
 static struct iommu_ops arm_smmu_ops;
+static const struct domain_ops arm_smmu_domain_ops;
 
 #ifdef CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS
 static int arm_smmu_bus_init(struct iommu_ops *ops);
@@ -888,6 +889,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->cb_lock);
+	iommu_domain_set_ops(&smmu_domain->domain, &arm_smmu_domain_ops);
 
 	return &smmu_domain->domain;
 }
@@ -1583,19 +1585,10 @@ static int arm_smmu_def_domain_type(struct device *dev)
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
-	.domain_free		= arm_smmu_domain_free,
-	.attach_dev		= arm_smmu_attach_dev,
-	.map_pages		= arm_smmu_map_pages,
-	.unmap_pages		= arm_smmu_unmap_pages,
-	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
-	.iotlb_sync		= arm_smmu_iotlb_sync,
-	.iova_to_phys		= arm_smmu_iova_to_phys,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.probe_finalize		= arm_smmu_probe_finalize,
 	.device_group		= arm_smmu_device_group,
-	.enable_nesting		= arm_smmu_enable_nesting,
-	.set_pgtable_quirks	= arm_smmu_set_pgtable_quirks,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
@@ -1604,6 +1597,18 @@ static struct iommu_ops arm_smmu_ops = {
 	.owner			= THIS_MODULE,
 };
 
+static const struct domain_ops arm_smmu_domain_ops = {
+	.attach_dev		= arm_smmu_attach_dev,
+	.map_pages		= arm_smmu_map_pages,
+	.unmap_pages		= arm_smmu_unmap_pages,
+	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
+	.iotlb_sync		= arm_smmu_iotlb_sync,
+	.iova_to_phys		= arm_smmu_iova_to_phys,
+	.set_pgtable_quirks	= arm_smmu_set_pgtable_quirks,
+	.enable_nesting		= arm_smmu_enable_nesting,
+	.release		= arm_smmu_domain_free,
+};
+
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
 	int i;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index b91874cb6cf3..4469e4758dbb 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -78,6 +78,7 @@ static struct qcom_iommu_domain *to_qcom_iommu_domain(struct iommu_domain *dom)
 }
 
 static const struct iommu_ops qcom_iommu_ops;
+static const struct domain_ops qcom_domain_ops;
 
 static struct qcom_iommu_dev * to_iommu(struct device *dev)
 {
@@ -336,6 +337,7 @@ static struct iommu_domain *qcom_iommu_domain_alloc(unsigned type)
 
 	mutex_init(&qcom_domain->init_mutex);
 	spin_lock_init(&qcom_domain->pgtbl_lock);
+	iommu_domain_set_ops(&qcom_domain->domain, &qcom_domain_ops);
 
 	return &qcom_domain->domain;
 }
@@ -590,7 +592,14 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 static const struct iommu_ops qcom_iommu_ops = {
 	.capable	= qcom_iommu_capable,
 	.domain_alloc	= qcom_iommu_domain_alloc,
-	.domain_free	= qcom_iommu_domain_free,
+	.probe_device	= qcom_iommu_probe_device,
+	.release_device	= qcom_iommu_release_device,
+	.device_group	= generic_device_group,
+	.of_xlate	= qcom_iommu_of_xlate,
+	.pgsize_bitmap	= SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+};
+
+static const struct domain_ops qcom_domain_ops = {
 	.attach_dev	= qcom_iommu_attach_dev,
 	.detach_dev	= qcom_iommu_detach_dev,
 	.map		= qcom_iommu_map,
@@ -598,11 +607,7 @@ static const struct iommu_ops qcom_iommu_ops = {
 	.flush_iotlb_all = qcom_iommu_flush_iotlb_all,
 	.iotlb_sync	= qcom_iommu_iotlb_sync,
 	.iova_to_phys	= qcom_iommu_iova_to_phys,
-	.probe_device	= qcom_iommu_probe_device,
-	.release_device	= qcom_iommu_release_device,
-	.device_group	= generic_device_group,
-	.of_xlate	= qcom_iommu_of_xlate,
-	.pgsize_bitmap	= SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+	.release	= qcom_iommu_domain_free,
 };
 
 static int qcom_iommu_sec_ptbl_init(struct device *dev)
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 939ffa768986..b69e8b52f3c4 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -563,6 +563,7 @@ static void sysmmu_tlb_invalidate_entry(struct sysmmu_drvdata *data,
 }
 
 static const struct iommu_ops exynos_iommu_ops;
+static const struct domain_ops exynos_domain_ops;
 
 static int exynos_sysmmu_probe(struct platform_device *pdev)
 {
@@ -767,6 +768,7 @@ static struct iommu_domain *exynos_iommu_domain_alloc(unsigned type)
 	domain->domain.geometry.aperture_start = 0;
 	domain->domain.geometry.aperture_end   = ~0UL;
 	domain->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&domain->domain, &exynos_domain_ops);
 
 	return &domain->domain;
 
@@ -1309,12 +1311,6 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops exynos_iommu_ops = {
 	.domain_alloc = exynos_iommu_domain_alloc,
-	.domain_free = exynos_iommu_domain_free,
-	.attach_dev = exynos_iommu_attach_device,
-	.detach_dev = exynos_iommu_detach_device,
-	.map = exynos_iommu_map,
-	.unmap = exynos_iommu_unmap,
-	.iova_to_phys = exynos_iommu_iova_to_phys,
 	.device_group = generic_device_group,
 	.probe_device = exynos_iommu_probe_device,
 	.release_device = exynos_iommu_release_device,
@@ -1322,6 +1318,15 @@ static const struct iommu_ops exynos_iommu_ops = {
 	.of_xlate = exynos_iommu_of_xlate,
 };
 
+static const struct domain_ops exynos_domain_ops = {
+	.attach_dev = exynos_iommu_attach_device,
+	.detach_dev = exynos_iommu_detach_device,
+	.map = exynos_iommu_map,
+	.unmap = exynos_iommu_unmap,
+	.iova_to_phys = exynos_iommu_iova_to_phys,
+	.release = exynos_iommu_domain_free,
+};
+
 static int __init exynos_iommu_init(void)
 {
 	struct device_node *np;
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index a47f47307109..0caae87d1471 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -20,6 +20,7 @@ static DEFINE_SPINLOCK(iommu_lock);
 static struct kmem_cache *fsl_pamu_domain_cache;
 static struct kmem_cache *iommu_devinfo_cache;
 static DEFINE_SPINLOCK(device_domain_lock);
+static const struct domain_ops fsl_pamu_domain_ops;
 
 struct iommu_device pamu_iommu;	/* IOMMU core code handle */
 
@@ -210,6 +211,7 @@ static struct iommu_domain *fsl_pamu_domain_alloc(unsigned type)
 	dma_domain->iommu_domain. geometry.aperture_start = 0;
 	dma_domain->iommu_domain.geometry.aperture_end = (1ULL << 36) - 1;
 	dma_domain->iommu_domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&dma_domain->iommu_domain, &fsl_pamu_domain_ops);
 
 	return &dma_domain->iommu_domain;
 }
@@ -453,15 +455,18 @@ static void fsl_pamu_release_device(struct device *dev)
 static const struct iommu_ops fsl_pamu_ops = {
 	.capable	= fsl_pamu_capable,
 	.domain_alloc	= fsl_pamu_domain_alloc,
-	.domain_free    = fsl_pamu_domain_free,
-	.attach_dev	= fsl_pamu_attach_device,
-	.detach_dev	= fsl_pamu_detach_device,
-	.iova_to_phys	= fsl_pamu_iova_to_phys,
 	.probe_device	= fsl_pamu_probe_device,
 	.release_device	= fsl_pamu_release_device,
 	.device_group   = fsl_pamu_device_group,
 };
 
+static const struct domain_ops fsl_pamu_domain_ops = {
+	.attach_dev	= fsl_pamu_attach_device,
+	.detach_dev	= fsl_pamu_detach_device,
+	.iova_to_phys	= fsl_pamu_iova_to_phys,
+	.release	= fsl_pamu_domain_free,
+};
+
 int __init pamu_domain_init(void)
 {
 	int ret = 0;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2b5f4e57a8bb..1a5ea7c23e4b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -385,6 +385,7 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info,
 }
 
 const struct iommu_ops intel_iommu_ops;
+static const struct domain_ops intel_domain_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
 {
@@ -2772,6 +2773,8 @@ static int __init si_domain_init(int hw)
 		return -EFAULT;
 	}
 
+	iommu_domain_set_ops(&si_domain->domain, &intel_domain_ops);
+
 	if (hw)
 		return 0;
 
@@ -4586,6 +4589,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 		domain->geometry.aperture_end   =
 				__DOMAIN_MAX_ADDR(dmar_domain->gaw);
 		domain->geometry.force_aperture = true;
+		iommu_domain_set_ops(domain, &intel_domain_ops);
 
 		return domain;
 	case IOMMU_DOMAIN_IDENTITY:
@@ -5093,15 +5097,6 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
-	.domain_free		= intel_iommu_domain_free,
-	.attach_dev		= intel_iommu_attach_device,
-	.detach_dev		= intel_iommu_detach_device,
-	.map_pages		= intel_iommu_map_pages,
-	.unmap_pages		= intel_iommu_unmap_pages,
-	.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
-	.flush_iotlb_all        = intel_flush_iotlb_all,
-	.iotlb_sync		= intel_iommu_tlb_sync,
-	.iova_to_phys		= intel_iommu_iova_to_phys,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
 	.release_device		= intel_iommu_release_device,
@@ -5121,6 +5116,18 @@ const struct iommu_ops intel_iommu_ops = {
 #endif
 };
 
+static const struct domain_ops intel_domain_ops = {
+	.attach_dev		= intel_iommu_attach_device,
+	.detach_dev		= intel_iommu_detach_device,
+	.map_pages		= intel_iommu_map_pages,
+	.unmap_pages		= intel_iommu_unmap_pages,
+	.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
+	.flush_iotlb_all        = intel_flush_iotlb_all,
+	.iotlb_sync		= intel_iommu_tlb_sync,
+	.iova_to_phys		= intel_iommu_iova_to_phys,
+	.release		= intel_iommu_domain_free,
+};
+
 static void quirk_iommu_igfx(struct pci_dev *dev)
 {
 	if (risky_device(dev))
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6631e2ea44df..b44e5ab12b8a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1954,7 +1954,6 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	if (!domain)
 		return NULL;
 
-	domain->ops  = bus->iommu_ops;
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
@@ -1975,7 +1974,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 void iommu_domain_free(struct iommu_domain *domain)
 {
 	iommu_put_dma_cookie(domain);
-	domain->ops->domain_free(domain);
+	domain->ops->release(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
@@ -2248,7 +2247,7 @@ static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			     phys_addr_t paddr, size_t size, int prot,
 			     gfp_t gfp, size_t *mapped)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	size_t pgsize, count;
 	int ret;
 
@@ -2271,7 +2270,7 @@ static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
 static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 		       phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
 	size_t orig_size = size;
@@ -2331,7 +2330,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
 		      phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	int ret;
 
 	ret = __iommu_map(domain, iova, paddr, size, prot, gfp);
@@ -2360,7 +2359,7 @@ static size_t __iommu_unmap_pages(struct iommu_domain *domain,
 				  unsigned long iova, size_t size,
 				  struct iommu_iotlb_gather *iotlb_gather)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	size_t pgsize, count;
 
 	pgsize = iommu_pgsize(domain, iova, iova, size, &count);
@@ -2373,7 +2372,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 			    unsigned long iova, size_t size,
 			    struct iommu_iotlb_gather *iotlb_gather)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	size_t unmapped_page, unmapped = 0;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
@@ -2449,7 +2448,7 @@ static ssize_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 		struct scatterlist *sg, unsigned int nents, int prot,
 		gfp_t gfp)
 {
-	const struct iommu_ops *ops = domain->ops;
+	const struct domain_ops *ops = domain->ops;
 	size_t len = 0, mapped = 0;
 	phys_addr_t start;
 	unsigned int i = 0;
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ca752bdc710f..6b92130181c3 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -566,6 +566,7 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
 /* -----------------------------------------------------------------------------
  * IOMMU Operations
  */
+static const struct domain_ops ipmmu_domain_ops;
 
 static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
 {
@@ -579,6 +580,7 @@ static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
 		return NULL;
 
 	mutex_init(&domain->mutex);
+	iommu_domain_set_ops(&domain->io_domain, &ipmmu_domain_ops);
 
 	return &domain->io_domain;
 }
@@ -868,14 +870,6 @@ static struct iommu_group *ipmmu_find_group(struct device *dev)
 
 static const struct iommu_ops ipmmu_ops = {
 	.domain_alloc = ipmmu_domain_alloc,
-	.domain_free = ipmmu_domain_free,
-	.attach_dev = ipmmu_attach_device,
-	.detach_dev = ipmmu_detach_device,
-	.map = ipmmu_map,
-	.unmap = ipmmu_unmap,
-	.flush_iotlb_all = ipmmu_flush_iotlb_all,
-	.iotlb_sync = ipmmu_iotlb_sync,
-	.iova_to_phys = ipmmu_iova_to_phys,
 	.probe_device = ipmmu_probe_device,
 	.release_device = ipmmu_release_device,
 	.probe_finalize = ipmmu_probe_finalize,
@@ -885,6 +879,17 @@ static const struct iommu_ops ipmmu_ops = {
 	.of_xlate = ipmmu_of_xlate,
 };
 
+static const struct domain_ops ipmmu_domain_ops = {
+	.attach_dev = ipmmu_attach_device,
+	.detach_dev = ipmmu_detach_device,
+	.map = ipmmu_map,
+	.unmap = ipmmu_unmap,
+	.flush_iotlb_all = ipmmu_flush_iotlb_all,
+	.iotlb_sync = ipmmu_iotlb_sync,
+	.iova_to_phys = ipmmu_iova_to_phys,
+	.release = ipmmu_domain_free,
+};
+
 /* -----------------------------------------------------------------------------
  * Probe/remove and init
  */
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3a38352b603f..757e46c84044 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -36,6 +36,7 @@ __asm__ __volatile__ (							\
 static DEFINE_SPINLOCK(msm_iommu_lock);
 static LIST_HEAD(qcom_iommu_devices);
 static struct iommu_ops msm_iommu_ops;
+static const struct domain_ops msm_domain_ops;
 
 struct msm_priv {
 	struct list_head list_attached;
@@ -318,6 +319,7 @@ static struct iommu_domain *msm_iommu_domain_alloc(unsigned type)
 	priv->domain.geometry.aperture_start = 0;
 	priv->domain.geometry.aperture_end   = (1ULL << 32) - 1;
 	priv->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&priv->domain, &msm_domain_ops);
 
 	return &priv->domain;
 
@@ -674,7 +676,14 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 static struct iommu_ops msm_iommu_ops = {
 	.capable = msm_iommu_capable,
 	.domain_alloc = msm_iommu_domain_alloc,
-	.domain_free = msm_iommu_domain_free,
+	.probe_device = msm_iommu_probe_device,
+	.release_device = msm_iommu_release_device,
+	.device_group = generic_device_group,
+	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
+	.of_xlate = qcom_iommu_of_xlate,
+};
+
+static const struct domain_ops msm_domain_ops = {
 	.attach_dev = msm_iommu_attach_dev,
 	.detach_dev = msm_iommu_detach_dev,
 	.map = msm_iommu_map,
@@ -688,11 +697,7 @@ static struct iommu_ops msm_iommu_ops = {
 	.iotlb_sync = NULL,
 	.iotlb_sync_map = msm_iommu_sync_map,
 	.iova_to_phys = msm_iommu_iova_to_phys,
-	.probe_device = msm_iommu_probe_device,
-	.release_device = msm_iommu_release_device,
-	.device_group = generic_device_group,
-	.pgsize_bitmap = MSM_IOMMU_PGSIZES,
-	.of_xlate = qcom_iommu_of_xlate,
+	.release = msm_iommu_domain_free,
 };
 
 static int msm_iommu_probe(struct platform_device *pdev)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 25b834104790..45c05899cfd9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -131,6 +131,7 @@ struct mtk_iommu_domain {
 };
 
 static const struct iommu_ops mtk_iommu_ops;
+static const struct domain_ops mtk_domain_ops;
 
 static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);
 
@@ -440,6 +441,8 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
 	if (!dom)
 		return NULL;
 
+	iommu_domain_set_ops(&dom->domain, &mtk_domain_ops);
+
 	return &dom->domain;
 }
 
@@ -658,15 +661,6 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
 
 static const struct iommu_ops mtk_iommu_ops = {
 	.domain_alloc	= mtk_iommu_domain_alloc,
-	.domain_free	= mtk_iommu_domain_free,
-	.attach_dev	= mtk_iommu_attach_device,
-	.detach_dev	= mtk_iommu_detach_device,
-	.map		= mtk_iommu_map,
-	.unmap		= mtk_iommu_unmap,
-	.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
-	.iotlb_sync	= mtk_iommu_iotlb_sync,
-	.iotlb_sync_map	= mtk_iommu_sync_map,
-	.iova_to_phys	= mtk_iommu_iova_to_phys,
 	.probe_device	= mtk_iommu_probe_device,
 	.release_device	= mtk_iommu_release_device,
 	.device_group	= mtk_iommu_device_group,
@@ -677,6 +671,18 @@ static const struct iommu_ops mtk_iommu_ops = {
 	.owner		= THIS_MODULE,
 };
 
+static const struct domain_ops mtk_domain_ops = {
+	.attach_dev	= mtk_iommu_attach_device,
+	.detach_dev	= mtk_iommu_detach_device,
+	.map		= mtk_iommu_map,
+	.unmap		= mtk_iommu_unmap,
+	.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
+	.iotlb_sync	= mtk_iommu_iotlb_sync,
+	.iotlb_sync_map	= mtk_iommu_sync_map,
+	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.release	= mtk_iommu_domain_free,
+};
+
 static int mtk_iommu_hw_init(const struct mtk_iommu_data *data)
 {
 	u32 regval;
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index be22fcf988ce..10a9d6be42ff 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -87,6 +87,8 @@
  */
 #define M2701_IOMMU_PGT_SIZE			SZ_4M
 
+static const struct domain_ops mtk_domain_ops;
+
 struct mtk_iommu_domain {
 	spinlock_t			pgtlock; /* lock for page table */
 	struct iommu_domain		domain;
@@ -246,6 +248,8 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
 	if (!dom)
 		return NULL;
 
+	iommu_domain_set_ops(&dom->domain, &mtk_domain_ops);
+
 	return &dom->domain;
 }
 
@@ -514,12 +518,6 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data)
 
 static const struct iommu_ops mtk_iommu_ops = {
 	.domain_alloc	= mtk_iommu_domain_alloc,
-	.domain_free	= mtk_iommu_domain_free,
-	.attach_dev	= mtk_iommu_attach_device,
-	.detach_dev	= mtk_iommu_detach_device,
-	.map		= mtk_iommu_map,
-	.unmap		= mtk_iommu_unmap,
-	.iova_to_phys	= mtk_iommu_iova_to_phys,
 	.probe_device	= mtk_iommu_probe_device,
 	.probe_finalize = mtk_iommu_probe_finalize,
 	.release_device	= mtk_iommu_release_device,
@@ -529,6 +527,15 @@ static const struct iommu_ops mtk_iommu_ops = {
 	.owner          = THIS_MODULE,
 };
 
+static const struct domain_ops mtk_domain_ops = {
+	.attach_dev	= mtk_iommu_attach_device,
+	.detach_dev	= mtk_iommu_detach_device,
+	.map		= mtk_iommu_map,
+	.unmap		= mtk_iommu_unmap,
+	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.release	= mtk_iommu_domain_free,
+};
+
 static const struct of_device_id mtk_iommu_of_ids[] = {
 	{ .compatible = "mediatek,mt2701-m4u", },
 	{}
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 91749654fd49..ecd9cb589a31 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -33,6 +33,7 @@
 #include "omap-iommu.h"
 
 static const struct iommu_ops omap_iommu_ops;
+static const struct domain_ops omap_domain_ops;
 
 #define to_iommu(dev)	((struct omap_iommu *)dev_get_drvdata(dev))
 
@@ -1582,6 +1583,7 @@ static struct iommu_domain *omap_iommu_domain_alloc(unsigned type)
 	omap_domain->domain.geometry.aperture_start = 0;
 	omap_domain->domain.geometry.aperture_end   = (1ULL << 32) - 1;
 	omap_domain->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&omap_domain->domain, &omap_domain_ops);
 
 	return &omap_domain->domain;
 }
@@ -1734,16 +1736,19 @@ static struct iommu_group *omap_iommu_device_group(struct device *dev)
 
 static const struct iommu_ops omap_iommu_ops = {
 	.domain_alloc	= omap_iommu_domain_alloc,
-	.domain_free	= omap_iommu_domain_free,
+	.probe_device	= omap_iommu_probe_device,
+	.release_device	= omap_iommu_release_device,
+	.device_group	= omap_iommu_device_group,
+	.pgsize_bitmap	= OMAP_IOMMU_PGSIZES,
+};
+
+static const struct domain_ops omap_domain_ops = {
 	.attach_dev	= omap_iommu_attach_dev,
 	.detach_dev	= omap_iommu_detach_dev,
 	.map		= omap_iommu_map,
 	.unmap		= omap_iommu_unmap,
 	.iova_to_phys	= omap_iommu_iova_to_phys,
-	.probe_device	= omap_iommu_probe_device,
-	.release_device	= omap_iommu_release_device,
-	.device_group	= omap_iommu_device_group,
-	.pgsize_bitmap	= OMAP_IOMMU_PGSIZES,
+	.release	= omap_iommu_domain_free,
 };
 
 static int __init omap_iommu_init(void)
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7f23ad61c094..0261c044ded1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -124,6 +124,7 @@ struct rk_iommudata {
 
 static struct device *dma_dev;
 static const struct rk_iommu_ops *rk_ops;
+static const struct domain_ops rk_domain_ops;
 
 static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma,
 				  unsigned int count)
@@ -1096,6 +1097,7 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type)
 	rk_domain->domain.geometry.aperture_start = 0;
 	rk_domain->domain.geometry.aperture_end   = DMA_BIT_MASK(32);
 	rk_domain->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&rk_domain->domain, &rk_domain_ops);
 
 	return &rk_domain->domain;
 
@@ -1187,19 +1189,22 @@ static int rk_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops rk_iommu_ops = {
 	.domain_alloc = rk_iommu_domain_alloc,
-	.domain_free = rk_iommu_domain_free,
-	.attach_dev = rk_iommu_attach_device,
-	.detach_dev = rk_iommu_detach_device,
-	.map = rk_iommu_map,
-	.unmap = rk_iommu_unmap,
 	.probe_device = rk_iommu_probe_device,
 	.release_device = rk_iommu_release_device,
-	.iova_to_phys = rk_iommu_iova_to_phys,
 	.device_group = rk_iommu_device_group,
 	.pgsize_bitmap = RK_IOMMU_PGSIZE_BITMAP,
 	.of_xlate = rk_iommu_of_xlate,
 };
 
+static const struct domain_ops rk_domain_ops = {
+	.attach_dev = rk_iommu_attach_device,
+	.detach_dev = rk_iommu_detach_device,
+	.map = rk_iommu_map,
+	.unmap = rk_iommu_unmap,
+	.iova_to_phys = rk_iommu_iova_to_phys,
+	.release = rk_iommu_domain_free,
+};
+
 static int rk_iommu_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 50860ebdd087..70011eedef7c 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -20,6 +20,7 @@
 #define S390_IOMMU_PGSIZES	(~0xFFFUL)
 
 static const struct iommu_ops s390_iommu_ops;
+static const struct domain_ops s390_domain_ops;
 
 struct s390_domain {
 	struct iommu_domain	domain;
@@ -71,6 +72,7 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
 	INIT_LIST_HEAD(&s390_domain->devices);
+	iommu_domain_set_ops(&s390_domain->domain, &s390_domain_ops);
 
 	return &s390_domain->domain;
 }
@@ -363,16 +365,19 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
 static const struct iommu_ops s390_iommu_ops = {
 	.capable = s390_iommu_capable,
 	.domain_alloc = s390_domain_alloc,
-	.domain_free = s390_domain_free,
+	.probe_device = s390_iommu_probe_device,
+	.release_device = s390_iommu_release_device,
+	.device_group = generic_device_group,
+	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+};
+
+static const struct domain_ops s390_domain_ops = {
 	.attach_dev = s390_iommu_attach_device,
 	.detach_dev = s390_iommu_detach_device,
 	.map = s390_iommu_map,
 	.unmap = s390_iommu_unmap,
 	.iova_to_phys = s390_iommu_iova_to_phys,
-	.probe_device = s390_iommu_probe_device,
-	.release_device = s390_iommu_release_device,
-	.device_group = generic_device_group,
-	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.release = s390_domain_free,
 };
 
 static int __init s390_iommu_init(void)
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 27ac818b0354..9fee9eafbff8 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -81,6 +81,7 @@ struct sprd_iommu_domain {
 };
 
 static const struct iommu_ops sprd_iommu_ops;
+static const struct domain_ops sprd_domain_ops;
 
 static struct sprd_iommu_domain *to_sprd_domain(struct iommu_domain *dom)
 {
@@ -147,6 +148,7 @@ static struct iommu_domain *sprd_iommu_domain_alloc(unsigned int domain_type)
 
 	dom->domain.geometry.aperture_start = 0;
 	dom->domain.geometry.aperture_end = SZ_256M - 1;
+	iommu_domain_set_ops(&dom->domain, &sprd_domain_ops);
 
 	return &dom->domain;
 }
@@ -416,7 +418,15 @@ static int sprd_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 
 static const struct iommu_ops sprd_iommu_ops = {
 	.domain_alloc	= sprd_iommu_domain_alloc,
-	.domain_free	= sprd_iommu_domain_free,
+	.probe_device	= sprd_iommu_probe_device,
+	.release_device	= sprd_iommu_release_device,
+	.device_group	= sprd_iommu_device_group,
+	.of_xlate	= sprd_iommu_of_xlate,
+	.pgsize_bitmap	= ~0UL << SPRD_IOMMU_PAGE_SHIFT,
+	.owner		= THIS_MODULE,
+};
+
+static const struct domain_ops sprd_domain_ops = {
 	.attach_dev	= sprd_iommu_attach_device,
 	.detach_dev	= sprd_iommu_detach_device,
 	.map		= sprd_iommu_map,
@@ -424,12 +434,7 @@ static const struct iommu_ops sprd_iommu_ops = {
 	.iotlb_sync_map	= sprd_iommu_sync_map,
 	.iotlb_sync	= sprd_iommu_sync,
 	.iova_to_phys	= sprd_iommu_iova_to_phys,
-	.probe_device	= sprd_iommu_probe_device,
-	.release_device	= sprd_iommu_release_device,
-	.device_group	= sprd_iommu_device_group,
-	.of_xlate	= sprd_iommu_of_xlate,
-	.pgsize_bitmap	= ~0UL << SPRD_IOMMU_PAGE_SHIFT,
-	.owner		= THIS_MODULE,
+	.release	= sprd_iommu_domain_free,
 };
 
 static const struct of_device_id sprd_iommu_of_match[] = {
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 92997021e188..7b5617695183 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -596,6 +596,8 @@ static phys_addr_t sun50i_iommu_iova_to_phys(struct iommu_domain *domain,
 		sun50i_iova_get_page_offset(iova);
 }
 
+static const struct domain_ops sun50i_domain_ops;
+
 static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type)
 {
 	struct sun50i_iommu_domain *sun50i_domain;
@@ -619,6 +621,7 @@ static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type)
 	sun50i_domain->domain.geometry.aperture_start = 0;
 	sun50i_domain->domain.geometry.aperture_end = DMA_BIT_MASK(32);
 	sun50i_domain->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&sun50i_domain->domain, &sun50i_domain_ops);
 
 	return &sun50i_domain->domain;
 
@@ -760,19 +763,22 @@ static int sun50i_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops sun50i_iommu_ops = {
 	.pgsize_bitmap	= SZ_4K,
-	.attach_dev	= sun50i_iommu_attach_device,
-	.detach_dev	= sun50i_iommu_detach_device,
 	.device_group	= sun50i_iommu_device_group,
 	.domain_alloc	= sun50i_iommu_domain_alloc,
-	.domain_free	= sun50i_iommu_domain_free,
+	.of_xlate	= sun50i_iommu_of_xlate,
+	.probe_device	= sun50i_iommu_probe_device,
+	.release_device	= sun50i_iommu_release_device,
+};
+
+static const struct domain_ops sun50i_domain_ops = {
+	.attach_dev	= sun50i_iommu_attach_device,
+	.detach_dev	= sun50i_iommu_detach_device,
 	.flush_iotlb_all = sun50i_iommu_flush_iotlb_all,
 	.iotlb_sync	= sun50i_iommu_iotlb_sync,
 	.iova_to_phys	= sun50i_iommu_iova_to_phys,
 	.map		= sun50i_iommu_map,
-	.of_xlate	= sun50i_iommu_of_xlate,
-	.probe_device	= sun50i_iommu_probe_device,
-	.release_device	= sun50i_iommu_release_device,
 	.unmap		= sun50i_iommu_unmap,
+	.release	= sun50i_iommu_domain_free,
 };
 
 static void sun50i_iommu_report_fault(struct sun50i_iommu *iommu,
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 6a358f92c7e5..cada6934c664 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -49,6 +49,7 @@ struct gart_device {
 static struct gart_device *gart_handle; /* unique for a system */
 
 static bool gart_debug;
+static const struct domain_ops gart_domain_ops;
 
 /*
  * Any interaction between any block on PPSB and a block on APB or AHB
@@ -153,6 +154,7 @@ static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
 		domain->geometry.aperture_start = gart_handle->iovmm_base;
 		domain->geometry.aperture_end = gart_handle->iovmm_end - 1;
 		domain->geometry.force_aperture = true;
+		iommu_domain_set_ops(domain, &gart_domain_ops);
 	}
 
 	return domain;
@@ -278,19 +280,22 @@ static void gart_iommu_sync(struct iommu_domain *domain,
 static const struct iommu_ops gart_iommu_ops = {
 	.capable	= gart_iommu_capable,
 	.domain_alloc	= gart_iommu_domain_alloc,
-	.domain_free	= gart_iommu_domain_free,
-	.attach_dev	= gart_iommu_attach_dev,
-	.detach_dev	= gart_iommu_detach_dev,
 	.probe_device	= gart_iommu_probe_device,
 	.release_device	= gart_iommu_release_device,
 	.device_group	= generic_device_group,
+	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
+	.of_xlate	= gart_iommu_of_xlate,
+};
+
+static const struct domain_ops gart_domain_ops = {
+	.attach_dev	= gart_iommu_attach_dev,
+	.detach_dev	= gart_iommu_detach_dev,
 	.map		= gart_iommu_map,
 	.unmap		= gart_iommu_unmap,
 	.iova_to_phys	= gart_iommu_iova_to_phys,
-	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
-	.of_xlate	= gart_iommu_of_xlate,
 	.iotlb_sync_map	= gart_iommu_sync_map,
 	.iotlb_sync	= gart_iommu_sync,
+	.release	= gart_iommu_domain_free,
 };
 
 int tegra_gart_suspend(struct gart_device *gart)
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index e900e3c46903..e9ea0454ba9d 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -62,6 +62,8 @@ struct tegra_smmu_as {
 	u32 attr;
 };
 
+static const struct domain_ops tegra_smmu_domain_ops;
+
 static struct tegra_smmu_as *to_smmu_as(struct iommu_domain *dom)
 {
 	return container_of(dom, struct tegra_smmu_as, domain);
@@ -317,6 +319,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
 	as->domain.geometry.aperture_start = 0;
 	as->domain.geometry.aperture_end = 0xffffffff;
 	as->domain.geometry.force_aperture = true;
+	iommu_domain_set_ops(&as->domain, &tegra_smmu_domain_ops);
 
 	return &as->domain;
 }
@@ -969,17 +972,20 @@ static int tegra_smmu_of_xlate(struct device *dev,
 static const struct iommu_ops tegra_smmu_ops = {
 	.capable = tegra_smmu_capable,
 	.domain_alloc = tegra_smmu_domain_alloc,
-	.domain_free = tegra_smmu_domain_free,
-	.attach_dev = tegra_smmu_attach_dev,
-	.detach_dev = tegra_smmu_detach_dev,
 	.probe_device = tegra_smmu_probe_device,
 	.release_device = tegra_smmu_release_device,
 	.device_group = tegra_smmu_device_group,
+	.of_xlate = tegra_smmu_of_xlate,
+	.pgsize_bitmap = SZ_4K,
+};
+
+static const struct domain_ops tegra_smmu_domain_ops = {
+	.attach_dev = tegra_smmu_attach_dev,
+	.detach_dev = tegra_smmu_detach_dev,
 	.map = tegra_smmu_map,
 	.unmap = tegra_smmu_unmap,
 	.iova_to_phys = tegra_smmu_iova_to_phys,
-	.of_xlate = tegra_smmu_of_xlate,
-	.pgsize_bitmap = SZ_4K,
+	.release = tegra_smmu_domain_free,
 };
 
 static void tegra_smmu_ahb_enable(void)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index f2aa34f57454..538409592c6a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -98,6 +98,8 @@ struct viommu_event {
 	};
 };
 
+static const struct domain_ops viommu_domain_ops;
+
 #define to_viommu_domain(domain)	\
 	container_of(domain, struct viommu_domain, domain)
 
@@ -652,6 +654,7 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type)
 	mutex_init(&vdomain->mutex);
 	spin_lock_init(&vdomain->mappings_lock);
 	vdomain->mappings = RB_ROOT_CACHED;
+	iommu_domain_set_ops(&vdomain->domain, &viommu_domain_ops);
 
 	return &vdomain->domain;
 }
@@ -1008,12 +1011,6 @@ static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 
 static struct iommu_ops viommu_ops = {
 	.domain_alloc		= viommu_domain_alloc,
-	.domain_free		= viommu_domain_free,
-	.attach_dev		= viommu_attach_dev,
-	.map			= viommu_map,
-	.unmap			= viommu_unmap,
-	.iova_to_phys		= viommu_iova_to_phys,
-	.iotlb_sync		= viommu_iotlb_sync,
 	.probe_device		= viommu_probe_device,
 	.probe_finalize		= viommu_probe_finalize,
 	.release_device		= viommu_release_device,
@@ -1024,6 +1021,15 @@ static struct iommu_ops viommu_ops = {
 	.owner			= THIS_MODULE,
 };
 
+static const struct domain_ops viommu_domain_ops = {
+	.attach_dev		= viommu_attach_dev,
+	.map			= viommu_map,
+	.unmap			= viommu_unmap,
+	.iova_to_phys		= viommu_iova_to_phys,
+	.iotlb_sync		= viommu_iotlb_sync,
+	.release		= viommu_domain_free,
+};
+
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
 	struct virtio_device *vdev = dev_to_virtio(viommu->dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks
  2022-01-24  7:10 ` [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks Lu Baolu
@ 2022-01-24  9:25   ` Christoph Hellwig
  0 siblings, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:25 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions
  2022-01-24  7:10 ` [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions Lu Baolu
@ 2022-01-24  9:26   ` Christoph Hellwig
  0 siblings, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:26 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks
  2022-01-24  7:10 ` [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks Lu Baolu
@ 2022-01-24  9:26   ` Christoph Hellwig
  0 siblings, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:26 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops
  2022-01-24  7:10 ` [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops Lu Baolu
@ 2022-01-24  9:27   ` Christoph Hellwig
  0 siblings, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:27 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
  2022-01-24  7:11 ` [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain Lu Baolu
@ 2022-01-24  9:29   ` Christoph Hellwig
  2022-01-25  2:59     ` Lu Baolu
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:29 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On Mon, Jan 24, 2022 at 03:11:00PM +0800, Lu Baolu wrote:
> The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
> field. Retrieve the value from the right place.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Link: https://lore.kernel.org/r/20211218074546.1772553-1-baolu.lu@linux.intel.com

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Wouldn't it mke sense to remove the pgsize_bitmap in struct iommu_ops
and initialize the domain field in the domain_alloc methods?  Or am I
missing something?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
@ 2022-01-24  9:32   ` Christoph Hellwig
  2022-01-25  3:01     ` Lu Baolu
  2022-01-24  9:48   ` Tian, Kevin
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:32 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.

Ugg. This really sounds like we should have a different structures for
each set of ops?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
@ 2022-01-24  9:37   ` Christoph Hellwig
  2022-01-24 17:24     ` Jason Gunthorpe
  2022-01-25  4:42     ` Lu Baolu
  2022-01-24  9:58   ` Tian, Kevin
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 53+ messages in thread
From: Christoph Hellwig @ 2022-01-24  9:37 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> Add a domain specific callback set, domain_ops, for vendor iommu driver
> to provide domain specific operations. Move domain-specific callbacks
> from iommu_ops to the domain_ops and hook them when a domain is allocated.

Ah, that's what I meant earlier.  Perfect!

Nit:  I don't think vendor is the right term here.

Maybe something like:

iommut: split struct iommu_ops

Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations.  This solves
the problem of needing to know if the method vector for a given
operation needs to be retreived from the device or th domain.

> +struct domain_ops {

This needs to keep an iommu in the name, i.e. iommu_domain_ops.

> +	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);

Overly long line.

> +static inline void iommu_domain_set_ops(struct iommu_domain *domain,
> +					const struct domain_ops *ops)
> +{
> +	domain->ops = ops;
> +}

Do we really need this helper?

> +static const struct domain_ops amd_domain_ops;

Can we avoid these forward declarations or would that cause too much
churn?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (6 preceding siblings ...)
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
@ 2022-01-24  9:46 ` Tian, Kevin
  2022-01-24 17:44   ` Jason Gunthorpe
  2022-02-08  1:32 ` Lu Baolu
  8 siblings, 1 reply; 53+ messages in thread
From: Tian, Kevin @ 2022-01-24  9:46 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu, Yi L, Pan, Jacob jun,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 24, 2022 3:11 PM
> 
> Hi,
> 
> The guest pasid and aux-domain related code are dead code in current
> iommu subtree. As we have reached a consensus that all these features
> should be based on the new iommufd framework (which is under active
> development), the first part of this series removes and cleanups all
> the dead code.
> 
> The second part of this series refactors the iommu_domain by moving all
> domain-specific ops from iommu_ops to a new domain_ops. This makes an
> iommu_domain self-contained and represent the abstraction of an I/O
> translation table in the IOMMU subsystem. With different type of
> iommu_domain providing different set of ops, it's easier to support more
> types of I/O translation tables.

You may want to give more background on this end goal. In general there
are four IOPT types in iommufd discussions:

1) The one currently tracked by iommu_domain, with a map/unmap semantics
2) The one managed by mm and shared to iommu via sva_bind/unbind ops
3) The one managed by userspace and bound to iommu via iommufd (require nesting)
4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface

Currently only 1) and 2) are supported. 2) is supported via sva specific 
structures, tracked outside of iommu_domain.

Following current direction 3) and 4) will end up to have separate tracking
structures too (some existing bits for 3 are removed by this series).

And this series is preparatory for extending the domain concept to be a 
general object for all IOPT types. And it is one important design direction
to be aligned in the iommu community.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
  2022-01-24  9:32   ` Christoph Hellwig
@ 2022-01-24  9:48   ` Tian, Kevin
  2022-01-25  3:04     ` Lu Baolu
  2022-01-24 17:36   ` Jason Gunthorpe
  2022-01-25  0:20   ` Robin Murphy
  3 siblings, 1 reply; 53+ messages in thread
From: Tian, Kevin @ 2022-01-24  9:48 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu, Yi L, Pan, Jacob jun,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 24, 2022 3:11 PM
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct
> iommu_iotlb_gather *gather)
>  	};
>  }
> 
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device
> *dev)

dev_get_iommu_ops or just dev_iommu_ops?

> +{
> +	if (dev && dev->iommu && dev->iommu->iommu_dev)
> +		return dev->iommu->iommu_dev->ops;
> +
> +	return NULL;
> +}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
  2022-01-24  9:37   ` Christoph Hellwig
@ 2022-01-24  9:58   ` Tian, Kevin
  2022-01-24 10:16     ` Jean-Philippe Brucker
                       ` (2 more replies)
  2022-01-24 17:55   ` Jason Gunthorpe
  2022-01-25  0:57   ` Robin Murphy
  3 siblings, 3 replies; 53+ messages in thread
From: Tian, Kevin @ 2022-01-24  9:58 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy
  Cc: Alex Williamson, Eric Auger, Liu, Yi L, Pan, Jacob jun,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 24, 2022 3:11 PM
> +/**
> + * struct domain_ops - per-domain ops
> + * @attach_dev: attach an iommu domain to a device
> + * @detach_dev: detach an iommu domain from a device

What is the criteria about whether an op should be iommu_ops or domain_ops
when it requires both domain and device pointers like above two (and future
PASID-based attach)?

Other examples include:
	@apply_resv_region
	@is_attach_deferred

Thanks
Kevin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  9:58   ` Tian, Kevin
@ 2022-01-24 10:16     ` Jean-Philippe Brucker
  2022-01-24 16:33       ` Jason Gunthorpe
  2022-01-24 17:17     ` Jason Gunthorpe
  2022-01-25  4:59     ` Lu Baolu
  2 siblings, 1 reply; 53+ messages in thread
From: Jean-Philippe Brucker @ 2022-01-24 10:16 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy, David Airlie,
	linux-kernel, iommu, Jonathan Hunter, Alex Williamson,
	Thierry Reding, Pan, Jacob jun, Daniel Vetter

On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> > Sent: Monday, January 24, 2022 3:11 PM
> > +/**
> > + * struct domain_ops - per-domain ops
> > + * @attach_dev: attach an iommu domain to a device
> > + * @detach_dev: detach an iommu domain from a device
> 
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?
> 
> Other examples include:
> 	@apply_resv_region
> 	@is_attach_deferred

Could attach_dev() be an IOMMU op?  So a driver could set the domain ops
in attach_dev() rather than domain_alloc(). That would allow to install
map()/unmap() ops that are tailored for the device's IOMMU, which we don't
know at domain_alloc() time. I'm thinking about a guest that has both
physical and virtual endpoints, which would ideally use different kinds of
domain ops to support both efficiently (caching mode vs page tables)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24 10:16     ` Jean-Philippe Brucker
@ 2022-01-24 16:33       ` Jason Gunthorpe
  2022-01-26  9:41         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 16:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, Lu Baolu, Joerg Roedel, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy, David Airlie,
	linux-kernel, iommu, Jonathan Hunter, Alex Williamson,
	Thierry Reding, Pan, Jacob jun, Daniel Vetter

On Mon, Jan 24, 2022 at 10:16:07AM +0000, Jean-Philippe Brucker wrote:
> On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <baolu.lu@linux.intel.com>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> > 
> > What is the criteria about whether an op should be iommu_ops or domain_ops
> > when it requires both domain and device pointers like above two (and future
> > PASID-based attach)?
> > 
> > Other examples include:
> > 	@apply_resv_region
> > 	@is_attach_deferred
> 
> Could attach_dev() be an IOMMU op?  So a driver could set the domain ops
> in attach_dev() rather than domain_alloc(). That would allow to install
> map()/unmap() ops that are tailored for the device's IOMMU, which we don't
> know at domain_alloc() time. 

I think we should be moving toward 'domain_alloc' returning the
correct domain and the way the driver implements the domain shouldn't
change after that.

> I'm thinking about a guest that has both physical and virtual
> endpoints, which would ideally use different kinds of domain ops to
> support both efficiently (caching mode vs page tables)

In this case shouldn't domain_alloc() reached from the struct device
already do the correct thing?

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  9:58   ` Tian, Kevin
  2022-01-24 10:16     ` Jean-Philippe Brucker
@ 2022-01-24 17:17     ` Jason Gunthorpe
  2022-01-25  0:58       ` Tian, Kevin
  2022-01-25  4:59     ` Lu Baolu
  2 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 17:17 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, Joerg Roedel, Christoph Hellwig, Ben Skeggs, Raj,
	Ashok, Will Deacon, Robin Murphy, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> > Sent: Monday, January 24, 2022 3:11 PM
> > +/**
> > + * struct domain_ops - per-domain ops
> > + * @attach_dev: attach an iommu domain to a device
> > + * @detach_dev: detach an iommu domain from a device
> 
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?
> 
> Other examples include:
> 	@apply_resv_region

For apply_resv_region the 'dev' argument is really selecting a device
that is already attached to the domain, so it should be in the domain
ops.

> 	@is_attach_deferred

Only two drivers implement this and neither use the domain
argument. Remove the domain argument and keep it in the iommu_ops

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  9:37   ` Christoph Hellwig
@ 2022-01-24 17:24     ` Jason Gunthorpe
  2022-01-25  4:43       ` Lu Baolu
  2022-01-25  4:42     ` Lu Baolu
  1 sibling, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 17:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lu Baolu, Joerg Roedel, Ben Skeggs, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Alex Williamson, Eric Auger, Liu Yi L,
	Jacob jun Pan, David Airlie, Daniel Vetter, Thierry Reding,
	Jonathan Hunter, iommu, linux-kernel

On Mon, Jan 24, 2022 at 01:37:21AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> > Add a domain specific callback set, domain_ops, for vendor iommu driver
> > to provide domain specific operations. Move domain-specific callbacks
> > from iommu_ops to the domain_ops and hook them when a domain is allocated.
> 
> Ah, that's what I meant earlier.  Perfect!
> 
> Nit:  I don't think vendor is the right term here.

+1 please don't use the confusing 'vendor' in the kernel, if you feel
you want to write 'vendor' writing 'device driver' is usually a good
choice..

This whole series is nice, the few small notes aside, thanks for
making it!

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
  2022-01-24  9:32   ` Christoph Hellwig
  2022-01-24  9:48   ` Tian, Kevin
@ 2022-01-24 17:36   ` Jason Gunthorpe
  2022-01-25  1:11     ` Tian, Kevin
  2022-01-25  3:18     ` Lu Baolu
  2022-01-25  0:20   ` Robin Murphy
  3 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 17:36 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Christoph Hellwig, Ben Skeggs, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Alex Williamson,
	Eric Auger, Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>  include/linux/iommu.h |  8 ++++++++
>  drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>  2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
>  	};
>  }
>  
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
> +{
> +	if (dev && dev->iommu && dev->iommu->iommu_dev)
> +		return dev->iommu->iommu_dev->ops;
> +
> +	return NULL;

What is the purpose of this helper?

> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>  	struct iommu_domain *domain = group->default_domain;
>  	struct iommu_resv_region *entry;
>  	struct list_head mappings;
> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>  		dma_addr_t start, end, addr;
>  		size_t map_size = 0;
>  
> -		if (domain->ops->apply_resv_region)
> -			domain->ops->apply_resv_region(dev, domain, entry);
> +		if (ops->apply_resv_region)
> +			ops->apply_resv_region(dev, domain, entry);

Here we call it and don't check for NULL? So why did we check the
interior pointers in the helper?

> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>  static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>  				     struct device *dev)
>  {
> -	if (domain->ops->is_attach_deferred)
> -		return domain->ops->is_attach_deferred(domain, dev);
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> +
> +	if (ops->is_attach_deferred)
> +		return ops->is_attach_deferred(domain, dev);

Same here, at least return false if ops is null..
  
> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
>  	struct iommu_fault_event *evt;
>  	struct iommu_fault_page_request *prm;
>  	struct dev_iommu *param = dev->iommu;
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>  	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>  
> -	if (!domain || !domain->ops->page_response)
> +	if (!ops || !ops->page_response)
>  		return -ENODEV;
>  
>  	if (!param || !param->fault_param)
> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>  			msg->pasid = 0;
>  		}
>  
> -		ret = domain->ops->page_response(dev, evt, msg);
> +		ret = ops->page_response(dev, evt, msg);
>  		list_del(&evt->list);
>  		kfree(evt);
>  		break;

Feels weird that page_response is not connected to a domain, the fault
originated from a domain after all. I would say this op should be
moved to the domain and the caller should provide the a pointer to the
domain that originated the fault.

Ideally since only some domain's will be configured to handle faults
at all - domains that can't do this should have a NULL page_response
op, even if other domains created by the same device driver could
handle page_response..

> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>  
>  static int iommu_group_do_probe_finalize(struct device *dev, void *data)
>  {
> -	struct iommu_domain *domain = data;
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>  
> -	if (domain->ops->probe_finalize)
> -		domain->ops->probe_finalize(dev);
> +	if (ops->probe_finalize)
> +		ops->probe_finalize(dev);

This is an oddball one too, it is finishing setting up the default
domain for a device? Several drivers seem to recover the default
domain in their implementations..

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-24  9:46 ` [PATCH 0/7] iommu cleanup and refactoring Tian, Kevin
@ 2022-01-24 17:44   ` Jason Gunthorpe
  2022-01-25  1:11     ` Tian, Kevin
  2022-01-25 14:48     ` Robin Murphy
  0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 17:44 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, Joerg Roedel, Christoph Hellwig, Ben Skeggs, Raj,
	Ashok, Will Deacon, Robin Murphy, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> > Sent: Monday, January 24, 2022 3:11 PM
> > 
> > Hi,
> > 
> > The guest pasid and aux-domain related code are dead code in current
> > iommu subtree. As we have reached a consensus that all these features
> > should be based on the new iommufd framework (which is under active
> > development), the first part of this series removes and cleanups all
> > the dead code.
> > 
> > The second part of this series refactors the iommu_domain by moving all
> > domain-specific ops from iommu_ops to a new domain_ops. This makes an
> > iommu_domain self-contained and represent the abstraction of an I/O
> > translation table in the IOMMU subsystem. With different type of
> > iommu_domain providing different set of ops, it's easier to support more
> > types of I/O translation tables.
> 
> You may want to give more background on this end goal. In general there
> are four IOPT types in iommufd discussions:
> 
> 1) The one currently tracked by iommu_domain, with a map/unmap semantics
> 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
> 3) The one managed by userspace and bound to iommu via iommufd (require nesting)
> 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface

Yes, at least from an iommufd perspective I'd like to see one struct
for all of these types, mainly so we can have a uniform alloc, attach
and detatch flow for all io page table types. 

If we want to use the iommu_domain, or make iommu_domain a sub-class
of a new struct, can be determined as we go along.

Regardless, I think this cleanup stands on its own. Moving the ops and
purging the dead code is clearly the right thing to do.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
  2022-01-24  9:37   ` Christoph Hellwig
  2022-01-24  9:58   ` Tian, Kevin
@ 2022-01-24 17:55   ` Jason Gunthorpe
  2022-01-25  5:04     ` Lu Baolu
  2022-01-25  0:57   ` Robin Murphy
  3 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-24 17:55 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Christoph Hellwig, Ben Skeggs, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Alex Williamson,
	Eric Auger, Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> -	int (*enable_nesting)(struct iommu_domain *domain);

Perhaps for another series, but enable_nesting looks like dead code
too, AFAICT.

Or at the very least I can't figure how out VFIO_TYPE1_NESTING_IOMMU
is supposed to work, or find any implementation of it in userspace.

8 years after it was merged in commit f5c9ecebaf2a ("vfio/iommu_type1:
add new VFIO_TYPE1_NESTING_IOMMU IOMMU type")

The commit comment says 'which are nested with the existing
translation installed by VFIO.' but it doesn't explain how the newly
created domain knows what its parent nest is..

Lu, there is an implementation in the Intel driver here, is it usable
at all?

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
                     ` (2 preceding siblings ...)
  2022-01-24 17:36   ` Jason Gunthorpe
@ 2022-01-25  0:20   ` Robin Murphy
  2022-01-25  3:54     ` Lu Baolu
  3 siblings, 1 reply; 53+ messages in thread
From: Robin Murphy @ 2022-01-25  0:20 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Kevin Tian, Ashok Raj, Will Deacon
  Cc: David Airlie, linux-kernel, iommu, Jonathan Hunter,
	Alex Williamson, Thierry Reding, Jacob jun Pan, Daniel Vetter

On 2022-01-24 07:11, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>   include/linux/iommu.h |  8 ++++++++
>   drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>   2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
>   	};
>   }
>   
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
> +{
> +	if (dev && dev->iommu && dev->iommu->iommu_dev)
> +		return dev->iommu->iommu_dev->ops;
> +
> +	return NULL;

This probably warrants at least a WARN, but it's arguable to just assume 
that valid ops must be installed if iommu_probe_device() has succeeded. 
The device ops are essentially for internal use within the IOMMU 
subsystem itself, so we should be able to trust ourselves not to misuse 
the helper.

> +}
> +
>   #define IOMMU_BUS_NOTIFY_PRIORITY		0
>   #define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
>   #define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5230c6d90ece..6631e2ea44df 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
>   static int iommu_create_device_direct_mappings(struct iommu_group *group,
>   					       struct device *dev)
>   {
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>   	struct iommu_domain *domain = group->default_domain;
>   	struct iommu_resv_region *entry;
>   	struct list_head mappings;
> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>   		dma_addr_t start, end, addr;
>   		size_t map_size = 0;
>   
> -		if (domain->ops->apply_resv_region)
> -			domain->ops->apply_resv_region(dev, domain, entry);
> +		if (ops->apply_resv_region)
> +			ops->apply_resv_region(dev, domain, entry);

Strictly I think this was a domain op, as it was about reserving the 
IOVA range in the given DMA domain. Also taking the domain as an 
argument is a bit of a giveaway. However it's now just dead code either 
way since there are no remaining implementations, and no reason for any 
new ones.

>   
>   		start = ALIGN(entry->start, pg_size);
>   		end   = ALIGN(entry->start + entry->length, pg_size);
> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>   static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>   				     struct device *dev)
>   {
> -	if (domain->ops->is_attach_deferred)
> -		return domain->ops->is_attach_deferred(domain, dev);
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> +
> +	if (ops->is_attach_deferred)
> +		return ops->is_attach_deferred(domain, dev);

Similarly if this takes a domain as its first argument then it's de 
facto a domain method. However, I'd concur that logically it *is* a 
device op, so let's drop that (unused) domain argument if we're cleaning up.

Maybe there's even an argument for factoring this out to a standard flag 
in dev_iommu rather than an op at all?

The others covered here look OK - we can blame PCI for page response 
being weirdly device-centric - however could we also convert all the 
feasible instances of dev->bus->iommu_ops to dev_iommu_ops() as well? 
(Subtly implying that I'm also not a fan of having "_get" in the name 
for a non-refcounted lookup...) Obviously iommu_probe_device() and 
iommmu_domain_alloc() still need bus ops at this point, but I'm working 
on that... :)

Thanks,
Robin.

>   	return false;
>   }
> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
>   	struct iommu_fault_event *evt;
>   	struct iommu_fault_page_request *prm;
>   	struct dev_iommu *param = dev->iommu;
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>   	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>   
> -	if (!domain || !domain->ops->page_response)
> +	if (!ops || !ops->page_response)
>   		return -ENODEV;
>   
>   	if (!param || !param->fault_param)
> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>   			msg->pasid = 0;
>   		}
>   
> -		ret = domain->ops->page_response(dev, evt, msg);
> +		ret = ops->page_response(dev, evt, msg);
>   		list_del(&evt->list);
>   		kfree(evt);
>   		break;
> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>   
>   static int iommu_group_do_probe_finalize(struct device *dev, void *data)
>   {
> -	struct iommu_domain *domain = data;
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>   
> -	if (domain->ops->probe_finalize)
> -		domain->ops->probe_finalize(dev);
> +	if (ops->probe_finalize)
> +		ops->probe_finalize(dev);
>   
>   	return 0;
>   }
> @@ -2020,7 +2023,7 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
>   
>   int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
>   {
> -	const struct iommu_ops *ops = domain->ops;
> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>   
>   	if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))
>   		return __iommu_attach_device(domain, dev);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
                     ` (2 preceding siblings ...)
  2022-01-24 17:55   ` Jason Gunthorpe
@ 2022-01-25  0:57   ` Robin Murphy
  2022-01-25  6:27     ` Lu Baolu
  3 siblings, 1 reply; 53+ messages in thread
From: Robin Murphy @ 2022-01-25  0:57 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Kevin Tian, Ashok Raj, Will Deacon
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 2022-01-24 07:11, Lu Baolu wrote:
> Add a domain specific callback set, domain_ops, for vendor iommu driver
> to provide domain specific operations. Move domain-specific callbacks
> from iommu_ops to the domain_ops and hook them when a domain is allocated.

I think it would make a lot of sense for iommu_domain_ops to be a 
substructure *within* iommu_ops. That way we can save most of the driver 
boilerplate here by not needing extra variables everywhere, and letting 
iommu_domain_alloc() still do a default initialisation like "domain->ops 
= bus->iommu_ops->domain_ops;"

I do like the future possibility of trying to flatten some of the 
io-pgtable indirection by having the SMMU drivers subsequently swizzle 
in alternate domain ops once they've picked a format, though. That was a 
bit too clunky to achieve at the whole iommu_ops level when I 
experimented with it years ago, but now it might well be worth another 
try...

One other comment below.

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>   include/linux/iommu.h                       | 93 ++++++++++++---------
>   drivers/iommu/amd/iommu.c                   | 21 +++--
>   drivers/iommu/apple-dart.c                  | 24 ++++--
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
>   drivers/iommu/arm/arm-smmu/arm-smmu.c       | 23 +++--
>   drivers/iommu/arm/arm-smmu/qcom_iommu.c     | 17 ++--
>   drivers/iommu/exynos-iommu.c                | 17 ++--
>   drivers/iommu/fsl_pamu_domain.c             | 13 ++-
>   drivers/iommu/intel/iommu.c                 | 25 ++++--
>   drivers/iommu/iommu.c                       | 15 ++--
>   drivers/iommu/ipmmu-vmsa.c                  | 21 +++--
>   drivers/iommu/msm_iommu.c                   | 17 ++--
>   drivers/iommu/mtk_iommu.c                   | 24 ++++--
>   drivers/iommu/mtk_iommu_v1.c                | 19 +++--
>   drivers/iommu/omap-iommu.c                  | 15 ++--
>   drivers/iommu/rockchip-iommu.c              | 17 ++--
>   drivers/iommu/s390-iommu.c                  | 15 ++--
>   drivers/iommu/sprd-iommu.c                  | 19 +++--
>   drivers/iommu/sun50i-iommu.c                | 18 ++--
>   drivers/iommu/tegra-gart.c                  | 15 ++--
>   drivers/iommu/tegra-smmu.c                  | 16 ++--
>   drivers/iommu/virtio-iommu.c                | 18 ++--
>   22 files changed, 305 insertions(+), 179 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 111b3e9c79bb..33c5c0e5c465 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -88,7 +88,7 @@ struct iommu_domain_geometry {
>   
>   struct iommu_domain {
>   	unsigned type;
> -	const struct iommu_ops *ops;
> +	const struct domain_ops *ops;
>   	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
>   	iommu_fault_handler_t handler;
>   	void *handler_token;
> @@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
>    * struct iommu_ops - iommu ops and capabilities
>    * @capable: check capability
>    * @domain_alloc: allocate iommu domain
> - * @domain_free: free iommu domain
> - * @attach_dev: attach device to an iommu domain
> - * @detach_dev: detach device from an iommu domain
> - * @map: map a physically contiguous memory region to an iommu domain
> - * @map_pages: map a physically contiguous set of pages of the same size to
> - *             an iommu domain.
> - * @unmap: unmap a physically contiguous memory region from an iommu domain
> - * @unmap_pages: unmap a number of pages of the same size from an iommu domain
> - * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
> - * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> - * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
> - *            queue
> - * @iova_to_phys: translate iova to physical address
>    * @probe_device: Add device to iommu driver handling
>    * @release_device: Remove device from iommu driver handling
>    * @probe_finalize: Do final setup work after the device is added to an IOMMU
>    *                  group and attached to the groups domain
>    * @device_group: find iommu group for a particular device
> - * @enable_nesting: Enable nesting
> - * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>    * @get_resv_regions: Request list of reserved regions for a device
>    * @put_resv_regions: Free list of reserved regions for a device
>    * @apply_resv_region: Temporary helper call-back for iova reserved ranges
> @@ -237,33 +222,11 @@ struct iommu_ops {
>   
>   	/* Domain allocation and freeing by the iommu driver */
>   	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> -	void (*domain_free)(struct iommu_domain *);
>   
> -	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> -	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> -	int (*map)(struct iommu_domain *domain, unsigned long iova,
> -		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> -	int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
> -			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
> -			 int prot, gfp_t gfp, size_t *mapped);
> -	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> -		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
> -	size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
> -			      size_t pgsize, size_t pgcount,
> -			      struct iommu_iotlb_gather *iotlb_gather);
> -	void (*flush_iotlb_all)(struct iommu_domain *domain);
> -	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> -			       size_t size);
> -	void (*iotlb_sync)(struct iommu_domain *domain,
> -			   struct iommu_iotlb_gather *iotlb_gather);
> -	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
>   	struct iommu_device *(*probe_device)(struct device *dev);
>   	void (*release_device)(struct device *dev);
>   	void (*probe_finalize)(struct device *dev);
>   	struct iommu_group *(*device_group)(struct device *dev);
> -	int (*enable_nesting)(struct iommu_domain *domain);
> -	int (*set_pgtable_quirks)(struct iommu_domain *domain,
> -				  unsigned long quirks);
>   
>   	/* Request/Free a list of reserved regions for a device */
>   	void (*get_resv_regions)(struct device *dev, struct list_head *list);
> @@ -296,6 +259,60 @@ struct iommu_ops {
>   	struct module *owner;
>   };
>   
> +/**
> + * struct domain_ops - per-domain ops
> + * @attach_dev: attach an iommu domain to a device
> + * @detach_dev: detach an iommu domain from a device
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @map_pages: map a physically contiguous set of pages of the same size to
> + *             an iommu domain.
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @unmap_pages: unmap a number of pages of the same size from an iommu domain
> + * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
> + * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> + * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
> + *            queue
> + * @iova_to_phys: translate iova to physical address
> + * @enable_nesting: Enable nesting
> + * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
> + * @release: Release the domain after use.

Not sure about the name change here - it's still logically a "free" that 
matches an "alloc", we're not doing anything clever like refcounting or 
devres, which is what I'd tend to associate with a "release" function 
for an allocated resource.

Robin.

> + */
> +struct domain_ops {
> +	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> +	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> +
> +	int (*map)(struct iommu_domain *domain, unsigned long iova,
> +		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> +	int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
> +			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
> +			 int prot, gfp_t gfp, size_t *mapped);
> +	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> +			size_t size, struct iommu_iotlb_gather *iotlb_gather);
> +	size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
> +			      size_t pgsize, size_t pgcount,
> +			      struct iommu_iotlb_gather *iotlb_gather);
> +
> +	void (*flush_iotlb_all)(struct iommu_domain *domain);
> +	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> +			       size_t size);
> +	void (*iotlb_sync)(struct iommu_domain *domain,
> +			   struct iommu_iotlb_gather *iotlb_gather);
> +
> +	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> +
> +	int (*enable_nesting)(struct iommu_domain *domain);
> +	int (*set_pgtable_quirks)(struct iommu_domain *domain,
> +				  unsigned long quirks);
> +
> +	void (*release)(struct iommu_domain *domain);
> +};

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24 17:17     ` Jason Gunthorpe
@ 2022-01-25  0:58       ` Tian, Kevin
  0 siblings, 0 replies; 53+ messages in thread
From: Tian, Kevin @ 2022-01-25  0:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Raj, Ashok, David Airlie, Robin Murphy, Pan, Jacob jun,
	Christoph Hellwig, Alex Williamson, Thierry Reding, Ben Skeggs,
	Daniel Vetter, Jonathan Hunter, Will Deacon, linux-kernel

> From: Jason Gunthorpe via iommu
> Sent: Tuesday, January 25, 2022 1:18 AM
> 
> On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <baolu.lu@linux.intel.com>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> >
> > What is the criteria about whether an op should be iommu_ops or
> domain_ops
> > when it requires both domain and device pointers like above two (and
> future
> > PASID-based attach)?
> >
> > Other examples include:
> > 	@apply_resv_region
> 
> For apply_resv_region the 'dev' argument is really selecting a device
> that is already attached to the domain, so it should be in the domain
> ops.

Looks this one can be just removed. No iommu driver implements today.

> 
> > 	@is_attach_deferred
> 
> Only two drivers implement this and neither use the domain
> argument. Remove the domain argument and keep it in the iommu_ops
> 

Yes, especially when none of the two drivers actually uses domain.

Generally it makes sense to me by putting all attach related callbacks in
iommu_ops, since they are all for building connections between device
and domain while such connection is device specific (just like how a
vfio device is connected to iommufd).

This in concept also applies to @enable_nesting which you mentioned
in another thread. Nesting is a per-device configuration between multiple
domains attached by a same device.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24 17:36   ` Jason Gunthorpe
@ 2022-01-25  1:11     ` Tian, Kevin
  2022-01-25  1:52       ` Robin Murphy
  2022-01-25  3:18     ` Lu Baolu
  1 sibling, 1 reply; 53+ messages in thread
From: Tian, Kevin @ 2022-01-25  1:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu
  Cc: Raj, Ashok, David Airlie, Robin Murphy, Jonathan Hunter,
	Christoph Hellwig, Alex Williamson, Thierry Reding, Ben Skeggs,
	Daniel Vetter, Will Deacon, linux-kernel, Pan, Jacob jun

> From: Jason Gunthorpe via iommu
> Sent: Tuesday, January 25, 2022 1:37 AM
> > @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> >  			msg->pasid = 0;
> >  		}
> >
> > -		ret = domain->ops->page_response(dev, evt, msg);
> > +		ret = ops->page_response(dev, evt, msg);
> >  		list_del(&evt->list);
> >  		kfree(evt);
> >  		break;
> 
> Feels weird that page_response is not connected to a domain, the fault
> originated from a domain after all. I would say this op should be
> moved to the domain and the caller should provide the a pointer to the
> domain that originated the fault.
> 

In concept yes. 

But currently the entire sva path is not associated with domain. This was
one mistake as we discussed in the cover letter. Before extending iommu
domain to cover CPU page tables we may have to leave it in iommu_ops 
given this series is just for cleanup...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-24 17:44   ` Jason Gunthorpe
@ 2022-01-25  1:11     ` Tian, Kevin
  2022-01-25 14:48     ` Robin Murphy
  1 sibling, 0 replies; 53+ messages in thread
From: Tian, Kevin @ 2022-01-25  1:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Joerg Roedel, Christoph Hellwig, Ben Skeggs, Raj,
	Ashok, Will Deacon, Robin Murphy, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, January 25, 2022 1:44 AM
> 
> On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <baolu.lu@linux.intel.com>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > >
> > > Hi,
> > >
> > > The guest pasid and aux-domain related code are dead code in current
> > > iommu subtree. As we have reached a consensus that all these features
> > > should be based on the new iommufd framework (which is under active
> > > development), the first part of this series removes and cleanups all
> > > the dead code.
> > >
> > > The second part of this series refactors the iommu_domain by moving all
> > > domain-specific ops from iommu_ops to a new domain_ops. This makes
> an
> > > iommu_domain self-contained and represent the abstraction of an I/O
> > > translation table in the IOMMU subsystem. With different type of
> > > iommu_domain providing different set of ops, it's easier to support more
> > > types of I/O translation tables.
> >
> > You may want to give more background on this end goal. In general there
> > are four IOPT types in iommufd discussions:
> >
> > 1) The one currently tracked by iommu_domain, with a map/unmap
> semantics
> > 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
> > 3) The one managed by userspace and bound to iommu via iommufd
> (require nesting)
> > 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD
> interface
> 
> Yes, at least from an iommufd perspective I'd like to see one struct
> for all of these types, mainly so we can have a uniform alloc, attach
> and detatch flow for all io page table types.
> 
> If we want to use the iommu_domain, or make iommu_domain a sub-class
> of a new struct, can be determined as we go along.
> 
> Regardless, I think this cleanup stands on its own. Moving the ops and
> purging the dead code is clearly the right thing to do.
> 

Indeed!

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-25  1:11     ` Tian, Kevin
@ 2022-01-25  1:52       ` Robin Murphy
  2022-01-25  2:08         ` Tian, Kevin
  0 siblings, 1 reply; 53+ messages in thread
From: Robin Murphy @ 2022-01-25  1:52 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Lu Baolu
  Cc: Raj, Ashok, David Airlie, Jonathan Hunter, Christoph Hellwig,
	Alex Williamson, Thierry Reding, Ben Skeggs, Daniel Vetter,
	Will Deacon, linux-kernel, Pan, Jacob jun

On 2022-01-25 01:11, Tian, Kevin wrote:
>> From: Jason Gunthorpe via iommu
>> Sent: Tuesday, January 25, 2022 1:37 AM
>>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>>>   			msg->pasid = 0;
>>>   		}
>>>
>>> -		ret = domain->ops->page_response(dev, evt, msg);
>>> +		ret = ops->page_response(dev, evt, msg);
>>>   		list_del(&evt->list);
>>>   		kfree(evt);
>>>   		break;
>>
>> Feels weird that page_response is not connected to a domain, the fault
>> originated from a domain after all. I would say this op should be
>> moved to the domain and the caller should provide the a pointer to the
>> domain that originated the fault.
>>
> 
> In concept yes.

Not even that, really. It's true that the "fault" itself is logically 
associated with the domain, but we never see that - the ATS request and 
response which encapsulate it all happen automatically on the PCI side. 
It's the endpoint that then decides to handle ATS translation failure 
via PRI, so all we actually get is a page request message from a 
RID/PASID, which most definitely represents the "device" (and in fact 
having to work backwards from there to figure out which domain/context 
it is currently attached to can be a bit of a pain). Similarly the 
response is a message directly back to the device itself - an operation 
on a domain may (or may not) have happened off the back of receiving the 
initial request, but even if the content of the response is to reflect 
that, the operation of responding is clearly focused on the device.

I fully agree that it's a weird-looking model, but that's how PCI SIG 
made it - and no IOMMU architecture seems to have tried to wrap it up in 
anything nicer either - so I don't see that we'd gain much from trying 
to pretend otherwise :)

Robin.

> But currently the entire sva path is not associated with domain. This was
> one mistake as we discussed in the cover letter. Before extending iommu
> domain to cover CPU page tables we may have to leave it in iommu_ops
> given this series is just for cleanup...
> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-25  1:52       ` Robin Murphy
@ 2022-01-25  2:08         ` Tian, Kevin
  2022-01-25  3:31           ` Lu Baolu
  0 siblings, 1 reply; 53+ messages in thread
From: Tian, Kevin @ 2022-01-25  2:08 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe, Lu Baolu
  Cc: Raj, Ashok, David Airlie, Jonathan Hunter, Christoph Hellwig,
	Alex Williamson, Thierry Reding, Ben Skeggs, Daniel Vetter,
	Will Deacon, linux-kernel, Pan, Jacob jun

> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Tuesday, January 25, 2022 9:52 AM
> 
> On 2022-01-25 01:11, Tian, Kevin wrote:
> >> From: Jason Gunthorpe via iommu
> >> Sent: Tuesday, January 25, 2022 1:37 AM
> >>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> >>>   			msg->pasid = 0;
> >>>   		}
> >>>
> >>> -		ret = domain->ops->page_response(dev, evt, msg);
> >>> +		ret = ops->page_response(dev, evt, msg);
> >>>   		list_del(&evt->list);
> >>>   		kfree(evt);
> >>>   		break;
> >>
> >> Feels weird that page_response is not connected to a domain, the fault
> >> originated from a domain after all. I would say this op should be
> >> moved to the domain and the caller should provide the a pointer to the
> >> domain that originated the fault.
> >>
> >
> > In concept yes.
> 
> Not even that, really. It's true that the "fault" itself is logically
> associated with the domain, but we never see that - the ATS request and
> response which encapsulate it all happen automatically on the PCI side.
> It's the endpoint that then decides to handle ATS translation failure
> via PRI, so all we actually get is a page request message from a
> RID/PASID, which most definitely represents the "device" (and in fact
> having to work backwards from there to figure out which domain/context
> it is currently attached to can be a bit of a pain). Similarly the
> response is a message directly back to the device itself - an operation
> on a domain may (or may not) have happened off the back of receiving the
> initial request, but even if the content of the response is to reflect
> that, the operation of responding is clearly focused on the device.
> 
> I fully agree that it's a weird-looking model, but that's how PCI SIG
> made it - and no IOMMU architecture seems to have tried to wrap it up in
> anything nicer either - so I don't see that we'd gain much from trying
> to pretend otherwise :)
> 

I think the point here is that although page requests are received
per device from the wire the low level iommu driver should convert
those requests into domain-wide requests (with RID/PASID recorded
as private data in the request) which then can be handled by domain 
ops in iommu core. Once a domain-wide request is completed by 
the iommu core, the low level iommu driver then retrieves RID/PASID 
information from private data of the completed request and triggers 
page response per RID/PASID in bus specific way.

Does it sound reasonable?

Thanks
Kevin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
  2022-01-24  9:29   ` Christoph Hellwig
@ 2022-01-25  2:59     ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  2:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Joerg Roedel, Jason Gunthorpe, Ben Skeggs, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Alex Williamson,
	Eric Auger, Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On 1/24/22 5:29 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:00PM +0800, Lu Baolu wrote:
>> The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
>> field. Retrieve the value from the right place.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
>> Link: https://lore.kernel.org/r/20211218074546.1772553-1-baolu.lu@linux.intel.com
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Wouldn't it mke sense to remove the pgsize_bitmap in struct iommu_ops
> and initialize the domain field in the domain_alloc methods?  Or am I
> missing something?
> 

It looks reasonable to me. The pgsize_bitmap is an attribute of a domain
that provides the map/unmap interfaces. It could be moved out of the
iommu_ops structure.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  9:32   ` Christoph Hellwig
@ 2022-01-25  3:01     ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  3:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Joerg Roedel, Jason Gunthorpe, Ben Skeggs, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Alex Williamson,
	Eric Auger, Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On 1/24/22 5:32 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
> 
> Ugg. This really sounds like we should have a different structures for
> each set of ops?
>

Yes. Do this in the following patch.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24  9:48   ` Tian, Kevin
@ 2022-01-25  3:04     ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  3:04 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy
  Cc: baolu.lu, Alex Williamson, Eric Auger, Liu, Yi L, Pan, Jacob jun,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/24/22 5:48 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, January 24, 2022 3:11 PM
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct
>> iommu_iotlb_gather *gather)
>>   	};
>>   }
>>
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device
>> *dev)
> 
> dev_get_iommu_ops or just dev_iommu_ops?

dev_iommu_ops() looks better.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-24 17:36   ` Jason Gunthorpe
  2022-01-25  1:11     ` Tian, Kevin
@ 2022-01-25  3:18     ` Lu Baolu
  1 sibling, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  3:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/25/22 1:36 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>   include/linux/iommu.h |  8 ++++++++
>>   drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
>>   	};
>>   }
>>   
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
>> +{
>> +	if (dev && dev->iommu && dev->iommu->iommu_dev)
>> +		return dev->iommu->iommu_dev->ops;
>> +
>> +	return NULL;
> 
> What is the purpose of this helper?

Get the iommu_ops from a device pointer. Just want to avoid duplicate
code in various functions.

> 
>> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>   	struct iommu_domain *domain = group->default_domain;
>>   	struct iommu_resv_region *entry;
>>   	struct list_head mappings;
>> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>>   		dma_addr_t start, end, addr;
>>   		size_t map_size = 0;
>>   
>> -		if (domain->ops->apply_resv_region)
>> -			domain->ops->apply_resv_region(dev, domain, entry);
>> +		if (ops->apply_resv_region)
>> +			ops->apply_resv_region(dev, domain, entry);
> 
> Here we call it and don't check for NULL? So why did we check the
> interior pointers in the helper?

Yes. Should check.

> 
>> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>>   static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>>   				     struct device *dev)
>>   {
>> -	if (domain->ops->is_attach_deferred)
>> -		return domain->ops->is_attach_deferred(domain, dev);
>> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> +
>> +	if (ops->is_attach_deferred)
>> +		return ops->is_attach_deferred(domain, dev);
> 
> Same here, at least return false if ops is null..

Yes.

>    
>> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
>>   	struct iommu_fault_event *evt;
>>   	struct iommu_fault_page_request *prm;
>>   	struct dev_iommu *param = dev->iommu;
>> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>   	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>>   
>> -	if (!domain || !domain->ops->page_response)
>> +	if (!ops || !ops->page_response)
>>   		return -ENODEV;
>>   
>>   	if (!param || !param->fault_param)
>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>>   			msg->pasid = 0;
>>   		}
>>   
>> -		ret = domain->ops->page_response(dev, evt, msg);
>> +		ret = ops->page_response(dev, evt, msg);
>>   		list_del(&evt->list);
>>   		kfree(evt);
>>   		break;
> 
> Feels weird that page_response is not connected to a domain, the fault
> originated from a domain after all. I would say this op should be
> moved to the domain and the caller should provide the a pointer to the
> domain that originated the fault.
> 
> Ideally since only some domain's will be configured to handle faults
> at all - domains that can't do this should have a NULL page_response
> op, even if other domains created by the same device driver could
> handle page_response..
> 
>> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>>   
>>   static int iommu_group_do_probe_finalize(struct device *dev, void *data)
>>   {
>> -	struct iommu_domain *domain = data;
>> +	const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>   
>> -	if (domain->ops->probe_finalize)
>> -		domain->ops->probe_finalize(dev);
>> +	if (ops->probe_finalize)
>> +		ops->probe_finalize(dev);
> 
> This is an oddball one too, it is finishing setting up the default
> domain for a device? Several drivers seem to recover the default
> domain in their implementations..

In order to avoid default domain type (DMA or IDENDITY) conflict among
devices in a same iommu_group, the probe process is divided into two
phases, one for determining the default domain type and the other for
allocating the default domain and attaching it to the device.

ops->probe_finalize() is called to tell the vendor iommu driver that
the device probe is done. Normally the vendor iommu driver could set the
dma ops in this callback.

> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-25  2:08         ` Tian, Kevin
@ 2022-01-25  3:31           ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  3:31 UTC (permalink / raw)
  To: Tian, Kevin, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Raj, Ashok, David Airlie, Jonathan Hunter,
	Christoph Hellwig, Alex Williamson, Thierry Reding, Ben Skeggs,
	Daniel Vetter, Will Deacon, linux-kernel, Pan, Jacob jun

On 1/25/22 10:08 AM, Tian, Kevin wrote:
>> From: Robin Murphy <robin.murphy@arm.com>
>> Sent: Tuesday, January 25, 2022 9:52 AM
>>
>> On 2022-01-25 01:11, Tian, Kevin wrote:
>>>> From: Jason Gunthorpe via iommu
>>>> Sent: Tuesday, January 25, 2022 1:37 AM
>>>>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>>>>>    			msg->pasid = 0;
>>>>>    		}
>>>>>
>>>>> -		ret = domain->ops->page_response(dev, evt, msg);
>>>>> +		ret = ops->page_response(dev, evt, msg);
>>>>>    		list_del(&evt->list);
>>>>>    		kfree(evt);
>>>>>    		break;
>>>>
>>>> Feels weird that page_response is not connected to a domain, the fault
>>>> originated from a domain after all. I would say this op should be
>>>> moved to the domain and the caller should provide the a pointer to the
>>>> domain that originated the fault.
>>>>
>>>
>>> In concept yes.
>>
>> Not even that, really. It's true that the "fault" itself is logically
>> associated with the domain, but we never see that - the ATS request and
>> response which encapsulate it all happen automatically on the PCI side.
>> It's the endpoint that then decides to handle ATS translation failure
>> via PRI, so all we actually get is a page request message from a
>> RID/PASID, which most definitely represents the "device" (and in fact
>> having to work backwards from there to figure out which domain/context
>> it is currently attached to can be a bit of a pain). Similarly the
>> response is a message directly back to the device itself - an operation
>> on a domain may (or may not) have happened off the back of receiving the
>> initial request, but even if the content of the response is to reflect
>> that, the operation of responding is clearly focused on the device.
>>
>> I fully agree that it's a weird-looking model, but that's how PCI SIG
>> made it - and no IOMMU architecture seems to have tried to wrap it up in
>> anything nicer either - so I don't see that we'd gain much from trying
>> to pretend otherwise :)
>>
> 
> I think the point here is that although page requests are received
> per device from the wire the low level iommu driver should convert
> those requests into domain-wide requests (with RID/PASID recorded
> as private data in the request) which then can be handled by domain
> ops in iommu core. Once a domain-wide request is completed by
> the iommu core, the low level iommu driver then retrieves RID/PASID
> information from private data of the completed request and triggers
> page response per RID/PASID in bus specific way.

I also have a pending series to associate the sva with an iommu domain
and make the existing I/O page fault framework generic (vs. sva
specific). Perhaps we can discuss the page fault handle/response there
with the real code.

> 
> Does it sound reasonable?
> 
> Thanks
> Kevin
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops
  2022-01-25  0:20   ` Robin Murphy
@ 2022-01-25  3:54     ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  3:54 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Kevin Tian, Ashok Raj, Will Deacon
  Cc: baolu.lu, David Airlie, linux-kernel, iommu, Jonathan Hunter,
	Alex Williamson, Thierry Reding, Jacob jun Pan, Daniel Vetter


On 1/25/22 8:20 AM, Robin Murphy wrote:
> On 2022-01-24 07:11, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   include/linux/iommu.h |  8 ++++++++
>>   drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct 
>> iommu_iotlb_gather *gather)
>>       };
>>   }
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device 
>> *dev)
>> +{
>> +    if (dev && dev->iommu && dev->iommu->iommu_dev)
>> +        return dev->iommu->iommu_dev->ops;
>> +
>> +    return NULL;
> 
> This probably warrants at least a WARN, but it's arguable to just assume 
> that valid ops must be installed if iommu_probe_device() has succeeded. 
> The device ops are essentially for internal use within the IOMMU 
> subsystem itself, so we should be able to trust ourselves not to misuse 
> the helper.

I agree that we could add a WARN() here. The expectation is that every
device going through the IOMMU interfaces or helpers should have been
probed by iommu_probe_device().

> 
>> +}
>> +
>>   #define IOMMU_BUS_NOTIFY_PRIORITY        0
>>   #define IOMMU_GROUP_NOTIFY_ADD_DEVICE        1 /* Device added */
>>   #define IOMMU_GROUP_NOTIFY_DEL_DEVICE        2 /* Pre Device removed */
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 5230c6d90ece..6631e2ea44df 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
>>   static int iommu_create_device_direct_mappings(struct iommu_group 
>> *group,
>>                              struct device *dev)
>>   {
>> +    const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>       struct iommu_domain *domain = group->default_domain;
>>       struct iommu_resv_region *entry;
>>       struct list_head mappings;
>> @@ -785,8 +786,8 @@ static int 
>> iommu_create_device_direct_mappings(struct iommu_group *group,
>>           dma_addr_t start, end, addr;
>>           size_t map_size = 0;
>> -        if (domain->ops->apply_resv_region)
>> -            domain->ops->apply_resv_region(dev, domain, entry);
>> +        if (ops->apply_resv_region)
>> +            ops->apply_resv_region(dev, domain, entry);
> 
> Strictly I think this was a domain op, as it was about reserving the 
> IOVA range in the given DMA domain. Also taking the domain as an 
> argument is a bit of a giveaway. However it's now just dead code either 
> way since there are no remaining implementations, and no reason for any 
> new ones.

This callback is a dead code. I will cleanup it.

$ git grep apply_resv_region
drivers/iommu/iommu.c:          if (ops->apply_resv_region)
drivers/iommu/iommu.c:                  ops->apply_resv_region(dev, 
domain, entry);
include/linux/iommu.h: * @apply_resv_region: Temporary helper call-back 
for iova reserved ranges
include/linux/iommu.h:  void (*apply_resv_region)(struct device *dev,

> 
>>           start = ALIGN(entry->start, pg_size);
>>           end   = ALIGN(entry->start + entry->length, pg_size);
>> @@ -831,8 +832,10 @@ static int 
>> iommu_create_device_direct_mappings(struct iommu_group *group,
>>   static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>>                        struct device *dev)
>>   {
>> -    if (domain->ops->is_attach_deferred)
>> -        return domain->ops->is_attach_deferred(domain, dev);
>> +    const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> +
>> +    if (ops->is_attach_deferred)
>> +        return ops->is_attach_deferred(domain, dev);
> 
> Similarly if this takes a domain as its first argument then it's de 
> facto a domain method. However, I'd concur that logically it *is* a 
> device op, so let's drop that (unused) domain argument if we're cleaning 
> up.
> 
> Maybe there's even an argument for factoring this out to a standard flag 
> in dev_iommu rather than an op at all?

Make it part of dev_iommu looks more attractive. Let me check how many
efforts will it take. If a lot of changes required, maybe we can remove
@domain in this series and then switch it to a dev_iommu flag in a
separated series.

> 
> The others covered here look OK - we can blame PCI for page response 
> being weirdly device-centric - however could we also convert all the 
> feasible instances of dev->bus->iommu_ops to dev_iommu_ops() as well?

Sure.

> (Subtly implying that I'm also not a fan of having "_get" in the name 
> for a non-refcounted lookup...) Obviously iommu_probe_device() and 
> iommmu_domain_alloc() still need bus ops at this point, but I'm working 
> on that... :)

Thanks and glad to know that.

> 
> Thanks,
> Robin.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  9:37   ` Christoph Hellwig
  2022-01-24 17:24     ` Jason Gunthorpe
@ 2022-01-25  4:42     ` Lu Baolu
  1 sibling, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  4:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Joerg Roedel, Jason Gunthorpe, Ben Skeggs, Kevin Tian,
	Ashok Raj, Will Deacon, Robin Murphy, Alex Williamson,
	Eric Auger, Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On 1/24/22 5:37 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>> to provide domain specific operations. Move domain-specific callbacks
>> from iommu_ops to the domain_ops and hook them when a domain is allocated.
> 
> Ah, that's what I meant earlier.  Perfect!
> 
> Nit:  I don't think vendor is the right term here.
> 
> Maybe something like:
> 
> iommut: split struct iommu_ops
> 
> Move the domain specific operations out of struct iommu_ops into a new
> structure that only has domain specific operations.  This solves
> the problem of needing to know if the method vector for a given
> operation needs to be retreived from the device or th domain.

Sure. Will use above description.

> 
>> +struct domain_ops {
> 
> This needs to keep an iommu in the name, i.e. iommu_domain_ops.

Sure.

> 
>> +	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> 
> Overly long line.

./scripts/checkpatch.pl --strict *.patch

didn't give me a WARN or CHECK. I will make it short anyway.

> 
>> +static inline void iommu_domain_set_ops(struct iommu_domain *domain,
>> +					const struct domain_ops *ops)
>> +{
>> +	domain->ops = ops;
>> +}
> 
> Do we really need this helper?

Unnecessary. I can set the pointer directly in the drivers.

> 
>> +static const struct domain_ops amd_domain_ops;
> 
> Can we avoid these forward declarations or would that cause too much
> churn?
> 

I don't like this either. But it's common to put the ops at the bottom
of the file in almost all iommu drivers.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24 17:24     ` Jason Gunthorpe
@ 2022-01-25  4:43       ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  4:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: baolu.lu, Joerg Roedel, Ben Skeggs, Kevin Tian, Ashok Raj,
	Will Deacon, Robin Murphy, Alex Williamson, Eric Auger, Liu Yi L,
	Jacob jun Pan, David Airlie, Daniel Vetter, Thierry Reding,
	Jonathan Hunter, iommu, linux-kernel

On 1/25/22 1:24 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 01:37:21AM -0800, Christoph Hellwig wrote:
>> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>>> to provide domain specific operations. Move domain-specific callbacks
>>> from iommu_ops to the domain_ops and hook them when a domain is allocated.
>>
>> Ah, that's what I meant earlier.  Perfect!
>>
>> Nit:  I don't think vendor is the right term here.
> 
> +1 please don't use the confusing 'vendor' in the kernel, if you feel
> you want to write 'vendor' writing 'device driver' is usually a good
> choice..

Sure!

> 
> This whole series is nice, the few small notes aside, thanks for
> making it!

Thank you!

> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24  9:58   ` Tian, Kevin
  2022-01-24 10:16     ` Jean-Philippe Brucker
  2022-01-24 17:17     ` Jason Gunthorpe
@ 2022-01-25  4:59     ` Lu Baolu
  2022-01-25 12:37       ` Jason Gunthorpe
  2 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  4:59 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy
  Cc: baolu.lu, Alex Williamson, Eric Auger, Liu, Yi L, Pan, Jacob jun,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/24/22 5:58 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, January 24, 2022 3:11 PM
>> +/**
>> + * struct domain_ops - per-domain ops
>> + * @attach_dev: attach an iommu domain to a device
>> + * @detach_dev: detach an iommu domain from a device
> 
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?

Generally ops belong to iommu_ops if they are only device oriented, and
domain_ops if they are only domain oriented. But it's really hard to
judge when both device and domain are involved. Good question. :-)

Perhaps we should leave the attach/detach interfaces in iommu_ops since
they are related to device capabilities. For example, some devices
support attach_dev_pasid, while others not.

> 
> Other examples include:
> 	@apply_resv_region

This will be deprecated.

> 	@is_attach_deferred

Should be at least device centric (domain doesn't play any role here).
Further step is to save the is_attach_deferred at a flag in dev_iommu
and remove the ops (as Robin suggested).

> 
> Thanks
> Kevin
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24 17:55   ` Jason Gunthorpe
@ 2022-01-25  5:04     ` Lu Baolu
  0 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  5:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy,
	Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/25/22 1:55 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>> -	int (*enable_nesting)(struct iommu_domain *domain);
> 
> 
> Lu, there is an implementation in the Intel driver here, is it usable
> at all?

It's useless and I have cleaned it up in this series.

> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-25  0:57   ` Robin Murphy
@ 2022-01-25  6:27     ` Lu Baolu
  2022-01-25 14:23       ` Robin Murphy
  0 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-25  6:27 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Kevin Tian, Ashok Raj, Will Deacon
  Cc: baolu.lu, Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/25/22 8:57 AM, Robin Murphy wrote:
> On 2022-01-24 07:11, Lu Baolu wrote:
>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>> to provide domain specific operations. Move domain-specific callbacks
>> from iommu_ops to the domain_ops and hook them when a domain is 
>> allocated.
> 
> I think it would make a lot of sense for iommu_domain_ops to be a 
> substructure *within* iommu_ops. That way we can save most of the driver 
> boilerplate here by not needing extra variables everywhere, and letting 
> iommu_domain_alloc() still do a default initialisation like "domain->ops 
> = bus->iommu_ops->domain_ops;"

In the new model, iommu_domain_ops and iommu_ops are not 1:1 mapped.
For example, a PASID-capable IOMMU could support DMA domain (which
supports map/unmap), SVA domain (which does not), and others in the
future. Different type of domain has its own domain_ops.

> 
> I do like the future possibility of trying to flatten some of the 
> io-pgtable indirection by having the SMMU drivers subsequently swizzle 
> in alternate domain ops once they've picked a format, though. That was a 
> bit too clunky to achieve at the whole iommu_ops level when I 
> experimented with it years ago, but now it might well be worth another 
> try...
> 
> One other comment below.
> 
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   include/linux/iommu.h                       | 93 ++++++++++++---------
>>   drivers/iommu/amd/iommu.c                   | 21 +++--
>>   drivers/iommu/apple-dart.c                  | 24 ++++--
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
>>   drivers/iommu/arm/arm-smmu/arm-smmu.c       | 23 +++--
>>   drivers/iommu/arm/arm-smmu/qcom_iommu.c     | 17 ++--
>>   drivers/iommu/exynos-iommu.c                | 17 ++--
>>   drivers/iommu/fsl_pamu_domain.c             | 13 ++-
>>   drivers/iommu/intel/iommu.c                 | 25 ++++--
>>   drivers/iommu/iommu.c                       | 15 ++--
>>   drivers/iommu/ipmmu-vmsa.c                  | 21 +++--
>>   drivers/iommu/msm_iommu.c                   | 17 ++--
>>   drivers/iommu/mtk_iommu.c                   | 24 ++++--
>>   drivers/iommu/mtk_iommu_v1.c                | 19 +++--
>>   drivers/iommu/omap-iommu.c                  | 15 ++--
>>   drivers/iommu/rockchip-iommu.c              | 17 ++--
>>   drivers/iommu/s390-iommu.c                  | 15 ++--
>>   drivers/iommu/sprd-iommu.c                  | 19 +++--
>>   drivers/iommu/sun50i-iommu.c                | 18 ++--
>>   drivers/iommu/tegra-gart.c                  | 15 ++--
>>   drivers/iommu/tegra-smmu.c                  | 16 ++--
>>   drivers/iommu/virtio-iommu.c                | 18 ++--
>>   22 files changed, 305 insertions(+), 179 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 111b3e9c79bb..33c5c0e5c465 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -88,7 +88,7 @@ struct iommu_domain_geometry {
>>   struct iommu_domain {
>>       unsigned type;
>> -    const struct iommu_ops *ops;
>> +    const struct domain_ops *ops;
>>       unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>>       iommu_fault_handler_t handler;
>>       void *handler_token;
>> @@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
>>    * struct iommu_ops - iommu ops and capabilities
>>    * @capable: check capability
>>    * @domain_alloc: allocate iommu domain
>> - * @domain_free: free iommu domain
>> - * @attach_dev: attach device to an iommu domain
>> - * @detach_dev: detach device from an iommu domain
>> - * @map: map a physically contiguous memory region to an iommu domain
>> - * @map_pages: map a physically contiguous set of pages of the same 
>> size to
>> - *             an iommu domain.
>> - * @unmap: unmap a physically contiguous memory region from an iommu 
>> domain
>> - * @unmap_pages: unmap a number of pages of the same size from an 
>> iommu domain
>> - * @flush_iotlb_all: Synchronously flush all hardware TLBs for this 
>> domain
>> - * @iotlb_sync_map: Sync mappings created recently using @map to the 
>> hardware
>> - * @iotlb_sync: Flush all queued ranges from the hardware TLBs and 
>> empty flush
>> - *            queue
>> - * @iova_to_phys: translate iova to physical address
>>    * @probe_device: Add device to iommu driver handling
>>    * @release_device: Remove device from iommu driver handling
>>    * @probe_finalize: Do final setup work after the device is added to 
>> an IOMMU
>>    *                  group and attached to the groups domain
>>    * @device_group: find iommu group for a particular device
>> - * @enable_nesting: Enable nesting
>> - * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>>    * @get_resv_regions: Request list of reserved regions for a device
>>    * @put_resv_regions: Free list of reserved regions for a device
>>    * @apply_resv_region: Temporary helper call-back for iova reserved 
>> ranges
>> @@ -237,33 +222,11 @@ struct iommu_ops {
>>       /* Domain allocation and freeing by the iommu driver */
>>       struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
>> -    void (*domain_free)(struct iommu_domain *);
>> -    int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>> -    void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>> -    int (*map)(struct iommu_domain *domain, unsigned long iova,
>> -           phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
>> -    int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
>> -             phys_addr_t paddr, size_t pgsize, size_t pgcount,
>> -             int prot, gfp_t gfp, size_t *mapped);
>> -    size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
>> -             size_t size, struct iommu_iotlb_gather *iotlb_gather);
>> -    size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long 
>> iova,
>> -                  size_t pgsize, size_t pgcount,
>> -                  struct iommu_iotlb_gather *iotlb_gather);
>> -    void (*flush_iotlb_all)(struct iommu_domain *domain);
>> -    void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long 
>> iova,
>> -                   size_t size);
>> -    void (*iotlb_sync)(struct iommu_domain *domain,
>> -               struct iommu_iotlb_gather *iotlb_gather);
>> -    phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, 
>> dma_addr_t iova);
>>       struct iommu_device *(*probe_device)(struct device *dev);
>>       void (*release_device)(struct device *dev);
>>       void (*probe_finalize)(struct device *dev);
>>       struct iommu_group *(*device_group)(struct device *dev);
>> -    int (*enable_nesting)(struct iommu_domain *domain);
>> -    int (*set_pgtable_quirks)(struct iommu_domain *domain,
>> -                  unsigned long quirks);
>>       /* Request/Free a list of reserved regions for a device */
>>       void (*get_resv_regions)(struct device *dev, struct list_head 
>> *list);
>> @@ -296,6 +259,60 @@ struct iommu_ops {
>>       struct module *owner;
>>   };
>> +/**
>> + * struct domain_ops - per-domain ops
>> + * @attach_dev: attach an iommu domain to a device
>> + * @detach_dev: detach an iommu domain from a device
>> + * @map: map a physically contiguous memory region to an iommu domain
>> + * @map_pages: map a physically contiguous set of pages of the same 
>> size to
>> + *             an iommu domain.
>> + * @unmap: unmap a physically contiguous memory region from an iommu 
>> domain
>> + * @unmap_pages: unmap a number of pages of the same size from an 
>> iommu domain
>> + * @flush_iotlb_all: Synchronously flush all hardware TLBs for this 
>> domain
>> + * @iotlb_sync_map: Sync mappings created recently using @map to the 
>> hardware
>> + * @iotlb_sync: Flush all queued ranges from the hardware TLBs and 
>> empty flush
>> + *            queue
>> + * @iova_to_phys: translate iova to physical address
>> + * @enable_nesting: Enable nesting
>> + * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>> + * @release: Release the domain after use.
> 
> Not sure about the name change here - it's still logically a "free" that 
> matches an "alloc", we're not doing anything clever like refcounting or 
> devres, which is what I'd tend to associate with a "release" function 
> for an allocated resource.

Yes. "free" is better.

> 
> Robin.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-25  4:59     ` Lu Baolu
@ 2022-01-25 12:37       ` Jason Gunthorpe
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-25 12:37 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Joerg Roedel, Christoph Hellwig, Ben Skeggs, Raj,
	Ashok, Will Deacon, Robin Murphy, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Tue, Jan 25, 2022 at 12:59:14PM +0800, Lu Baolu wrote:
> On 1/24/22 5:58 PM, Tian, Kevin wrote:
> > > From: Lu Baolu <baolu.lu@linux.intel.com>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> > 
> > What is the criteria about whether an op should be iommu_ops or domain_ops
> > when it requires both domain and device pointers like above two (and future
> > PASID-based attach)?
> 
> Generally ops belong to iommu_ops if they are only device oriented, and
> domain_ops if they are only domain oriented. But it's really hard to
> judge when both device and domain are involved. Good question. :-)
> 
> Perhaps we should leave the attach/detach interfaces in iommu_ops since
> they are related to device capabilities. For example, some devices
> support attach_dev_pasid, while others not.

Isn't the attach process for something like SVA or the KVM version
actually rather different code?

Ideally the function pointers should help minimize case statements/etc
inside the driver..

Or stated another way, I expect drivers will have different structs
container_of'ing back to the iommu_domain (ie intel_iommu_domain_sva,
say). Any code that needs to work differently depending on the struct
should have an op in the domain so it can sanely container_of without
a mess.

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-25  6:27     ` Lu Baolu
@ 2022-01-25 14:23       ` Robin Murphy
  2022-01-25 15:00         ` Jason Gunthorpe
  0 siblings, 1 reply; 53+ messages in thread
From: Robin Murphy @ 2022-01-25 14:23 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Ben Skeggs, Kevin Tian, Ashok Raj, Will Deacon
  Cc: Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 2022-01-25 06:27, Lu Baolu wrote:
> On 1/25/22 8:57 AM, Robin Murphy wrote:
>> On 2022-01-24 07:11, Lu Baolu wrote:
>>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>>> to provide domain specific operations. Move domain-specific callbacks
>>> from iommu_ops to the domain_ops and hook them when a domain is 
>>> allocated.
>>
>> I think it would make a lot of sense for iommu_domain_ops to be a 
>> substructure *within* iommu_ops. That way we can save most of the 
>> driver boilerplate here by not needing extra variables everywhere, and 
>> letting iommu_domain_alloc() still do a default initialisation like 
>> "domain->ops = bus->iommu_ops->domain_ops;"
> 
> In the new model, iommu_domain_ops and iommu_ops are not 1:1 mapped.
> For example, a PASID-capable IOMMU could support DMA domain (which
> supports map/unmap), SVA domain (which does not), and others in the
> future. Different type of domain has its own domain_ops.

Sure, it's clear that that's the direction in which this is headed, and 
as I say I'm quite excited about that. However there are a couple of 
points I think are really worth considering:

Where it's just about which operations are valid for which domains, it's 
even simpler for the core interface wrappers to validate the domain 
type, rather than forcing drivers to implement multiple ops structures 
purely for the sake of having different callbacks populated. We already 
have this in places, e.g. where iommu_map() checks for 
__IOMMU_DOMAIN_PAGING.

Paging domains are also effectively the baseline level of IOMMU API 
functionality. All drivers support them, and for the majority of drivers 
it's all they will ever support. Those drivers really don't benefit from 
any of the churn and boilerplate in this patch as-is, and it's so easy 
to compromise with a couple of lines of core code to handle the common 
case by default when the driver *isn't* one of the handful which ever 
actually cares to install their own per-domain ops. Consider how much 
cleaner this patch would look if the typical driver diff could be 
something completely minimal like this:

----->8-----
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index be22fcf988ce..6aff493e37ee 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -514,12 +514,14 @@ static int mtk_iommu_hw_init(const struct 
mtk_iommu_data *data)

  static const struct iommu_ops mtk_iommu_ops = {
  	.domain_alloc	= mtk_iommu_domain_alloc,
-	.domain_free	= mtk_iommu_domain_free,
-	.attach_dev	= mtk_iommu_attach_device,
-	.detach_dev	= mtk_iommu_detach_device,
-	.map		= mtk_iommu_map,
-	.unmap		= mtk_iommu_unmap,
-	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.domain_ops = &(const struct iommu_domain_ops){
+		.domain_free	= mtk_iommu_domain_free,
+		.attach_dev	= mtk_iommu_attach_device,
+		.detach_dev	= mtk_iommu_detach_device,
+		.map		= mtk_iommu_map,
+		.unmap		= mtk_iommu_unmap,
+		.iova_to_phys	= mtk_iommu_iova_to_phys,
+	}
  	.probe_device	= mtk_iommu_probe_device,
  	.probe_finalize = mtk_iommu_probe_finalize,
  	.release_device	= mtk_iommu_release_device,

-----8<-----

And of course I have to shy away from calling it default_domain_ops, 
since although it's logically default ops for domains, it is not 
specifically ops for default domains, sigh... :)

Cheers,
Robin.

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-24 17:44   ` Jason Gunthorpe
  2022-01-25  1:11     ` Tian, Kevin
@ 2022-01-25 14:48     ` Robin Murphy
  2022-01-25 15:16       ` Jason Gunthorpe
  1 sibling, 1 reply; 53+ messages in thread
From: Robin Murphy @ 2022-01-25 14:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Lu Baolu, Joerg Roedel, Christoph Hellwig, Ben Skeggs, Raj,
	Ashok, Will Deacon, Alex Williamson, Eric Auger, Liu, Yi L, Pan,
	Jacob jun, David Airlie, Daniel Vetter, Thierry Reding,
	Jonathan Hunter, iommu, linux-kernel

On 2022-01-24 17:44, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>> Sent: Monday, January 24, 2022 3:11 PM
>>>
>>> Hi,
>>>
>>> The guest pasid and aux-domain related code are dead code in current
>>> iommu subtree. As we have reached a consensus that all these features
>>> should be based on the new iommufd framework (which is under active
>>> development), the first part of this series removes and cleanups all
>>> the dead code.
>>>
>>> The second part of this series refactors the iommu_domain by moving all
>>> domain-specific ops from iommu_ops to a new domain_ops. This makes an
>>> iommu_domain self-contained and represent the abstraction of an I/O
>>> translation table in the IOMMU subsystem. With different type of
>>> iommu_domain providing different set of ops, it's easier to support more
>>> types of I/O translation tables.
>>
>> You may want to give more background on this end goal. In general there
>> are four IOPT types in iommufd discussions:
>>
>> 1) The one currently tracked by iommu_domain, with a map/unmap semantics
>> 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
>> 3) The one managed by userspace and bound to iommu via iommufd (require nesting)
>> 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface
> 
> Yes, at least from an iommufd perspective I'd like to see one struct
> for all of these types, mainly so we can have a uniform alloc, attach
> and detatch flow for all io page table types.

Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the 
mm and effectively replace iommu_sva seems like a logical and fairly 
small next step. We already have the paradigm of different domain types 
supporting different ops, so initially an SVA domain would simply allow 
bind/unbind rather than attach/detach/map/unmap.

It might then further be possible to hide SVA bind/unbind behind the 
attach/detach interface, but AFAICS we'd still need distinct flows for 
attaching/binding the whole device vs. attaching/binding to a PASID, 
since they are fundamentally different things in their own right, and 
the ideal API should give us the orthogonality to also bind a device to 
an SVA domain without PASID (e.g. for KVM stage 2, or userspace 
assignment of simpler fault/stall-tolerant devices), or attach PASIDs to 
regular iommu_domains.

That distinction could of course be consolidated by flipping to the 
other approach of explicitly allocating the PASID first, then wrapping 
it in a struct device that could then be passed through the same 
attach/detach interfaces and distinguished internally, but although I 
still have a fondness for that approach I know I'm about the only one :)

Cheers,
Robin.

> If we want to use the iommu_domain, or make iommu_domain a sub-class
> of a new struct, can be determined as we go along.
> 
> Regardless, I think this cleanup stands on its own. Moving the ops and
> purging the dead code is clearly the right thing to do.
> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-25 14:23       ` Robin Murphy
@ 2022-01-25 15:00         ` Jason Gunthorpe
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-25 15:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lu Baolu, Joerg Roedel, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Alex Williamson, Eric Auger,
	Liu Yi L, Jacob jun Pan, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Tue, Jan 25, 2022 at 02:23:52PM +0000, Robin Murphy wrote:
> On 2022-01-25 06:27, Lu Baolu wrote:

> Where it's just about which operations are valid for which domains, it's
> even simpler for the core interface wrappers to validate the domain type,
> rather than forcing drivers to implement multiple ops structures purely for
> the sake of having different callbacks populated. We already have this in
> places, e.g. where iommu_map() checks for __IOMMU_DOMAIN_PAGING.

In my experience it is usually much clearer to directly test the op
for NULL to know if a feature is supported than to invent flags to do
the same test.  eg ops->map/etc == NULL means no paging.

I think we should not be afraid to have multiple ops in drivers for
things that are actually different in the driver. This is usually a
net win vs tying to handle all the cases with different 'if' flows.

eg identity domains and others really would ideally eventually have a
NULL ops for map/unmap too.

The 'type' should conceptually be part of the ops, not the mutable
struct - but we don't have to get there all at once.

> Paging domains are also effectively the baseline level of IOMMU API
> functionality. All drivers support them, and for the majority of drivers
> it's all they will ever support. Those drivers really don't benefit from any
> of the churn and boilerplate in this patch as-is, and it's so easy to
> compromise with a couple of lines of core code to handle the common case by
> default when the driver *isn't* one of the handful which ever actually cares
> to install their own per-domain ops. Consider how much cleaner this patch
> would look if the typical driver diff could be something completely minimal
> like this:

It is clever, but I'm not sure if hoisting a single assignment out of
the driver is worth the small long term complexity of having different
driver flows?

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-25 14:48     ` Robin Murphy
@ 2022-01-25 15:16       ` Jason Gunthorpe
  2022-01-26  1:51         ` Lu Baolu
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-25 15:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Tian, Kevin, Lu Baolu, Joerg Roedel, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Tue, Jan 25, 2022 at 02:48:02PM +0000, Robin Murphy wrote:
 
> Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the mm
> and effectively replace iommu_sva seems like a logical and fairly small next
> step. We already have the paradigm of different domain types supporting
> different ops, so initially an SVA domain would simply allow bind/unbind
> rather than attach/detach/map/unmap.

I hope we can quickly get to a PASID enabled generic attach/detach
scheme - we really need this to do the uAPI part of this interface.

> they are fundamentally different things in their own right, and the ideal
> API should give us the orthogonality to also bind a device to an SVA domain
> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.

Yes, these are orthogonal things. A iommu driver that supports PASID
ideally should support PASID enabled attach/detatch for every
iommu_domain type it supports.

SVA should not be entangled with PASID beyond that SVA is often used
with PASID - a SVA iommu_domain should be fully usable with a RID too.

I'm hoping to see the core iommu code provide some simplified "SVA"
API that under the covers creates a SVA domain and then does a normal
PASID attach using the global PASID in the mm_struct - the
driver should not care what, or even if, PASID is used for a SVA
domain.

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-25 15:16       ` Jason Gunthorpe
@ 2022-01-26  1:51         ` Lu Baolu
  2022-01-26 13:27           ` Jason Gunthorpe
  0 siblings, 1 reply; 53+ messages in thread
From: Lu Baolu @ 2022-01-26  1:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: baolu.lu, Tian, Kevin, Joerg Roedel, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On 1/25/22 11:16 PM, Jason Gunthorpe wrote:
> On Tue, Jan 25, 2022 at 02:48:02PM +0000, Robin Murphy wrote:
>   
>> Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the mm
>> and effectively replace iommu_sva seems like a logical and fairly small next
>> step. We already have the paradigm of different domain types supporting
>> different ops, so initially an SVA domain would simply allow bind/unbind
>> rather than attach/detach/map/unmap.
> 
> I hope we can quickly get to a PASID enabled generic attach/detach
> scheme - we really need this to do the uAPI part of this interface.

Agreed. Jacob is working on kernel DMA with PASID. He needs such
interfaces as well. I have worked out an implementation for vt-d driver.
It could be post for review inside Jacob's series for kernel DMA with
PASID.

> 
>> they are fundamentally different things in their own right, and the ideal
>> API should give us the orthogonality to also bind a device to an SVA domain
>> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
>> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
> 
> Yes, these are orthogonal things. A iommu driver that supports PASID
> ideally should support PASID enabled attach/detatch for every
> iommu_domain type it supports.
> 
> SVA should not be entangled with PASID beyond that SVA is often used
> with PASID - a SVA iommu_domain should be fully usable with a RID too.

The prototype of PASID enabled attach/detach ops could look like:

        int (*attach_dev_pasid)(struct iommu_domain *domain,
                                struct device *dev, ioasid_t id);
        void (*detach_dev_pasid)(struct iommu_domain *domain,
                                 struct device *dev, ioasid_t id);

But the iommu driver should implement different callbacks for

1) attaching an IOMMU DMA domain to a PASID on device;
    - kernel DMA with PASID
    - mdev-like device passthrough
    - etc.
2) attaching a CPU-shared domain to a PASID on device;
    - SVA
    - guest PASID
    - etc.

> 
> I'm hoping to see the core iommu code provide some simplified "SVA"
> API that under the covers creates a SVA domain and then does a normal
> PASID attach using the global PASID in the mm_struct - the
> driver should not care what, or even if, PASID is used for a SVA
> domain.
> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
  2022-01-24 16:33       ` Jason Gunthorpe
@ 2022-01-26  9:41         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 53+ messages in thread
From: Jean-Philippe Brucker @ 2022-01-26  9:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Lu Baolu, Joerg Roedel, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Robin Murphy, David Airlie,
	linux-kernel, iommu, Jonathan Hunter, Alex Williamson,
	Thierry Reding, Pan, Jacob jun, Daniel Vetter

On Mon, Jan 24, 2022 at 12:33:02PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 10:16:07AM +0000, Jean-Philippe Brucker wrote:
> > On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > > From: Lu Baolu <baolu.lu@linux.intel.com>
> > > > Sent: Monday, January 24, 2022 3:11 PM
> > > > +/**
> > > > + * struct domain_ops - per-domain ops
> > > > + * @attach_dev: attach an iommu domain to a device
> > > > + * @detach_dev: detach an iommu domain from a device
> > > 
> > > What is the criteria about whether an op should be iommu_ops or domain_ops
> > > when it requires both domain and device pointers like above two (and future
> > > PASID-based attach)?
> > > 
> > > Other examples include:
> > > 	@apply_resv_region
> > > 	@is_attach_deferred
> > 
> > Could attach_dev() be an IOMMU op?  So a driver could set the domain ops
> > in attach_dev() rather than domain_alloc(). That would allow to install
> > map()/unmap() ops that are tailored for the device's IOMMU, which we don't
> > know at domain_alloc() time. 
> 
> I think we should be moving toward 'domain_alloc' returning the
> correct domain and the way the driver implements the domain shouldn't
> change after that.
> 
> > I'm thinking about a guest that has both physical and virtual
> > endpoints, which would ideally use different kinds of domain ops to
> > support both efficiently (caching mode vs page tables)
> 
> In this case shouldn't domain_alloc() reached from the struct device
> already do the correct thing?

Sure, if we can finalise the domains before attach that could also clean
up the drivers a bit.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-26  1:51         ` Lu Baolu
@ 2022-01-26 13:27           ` Jason Gunthorpe
  2022-01-26 14:00             ` Robin Murphy
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2022-01-26 13:27 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Robin Murphy, Tian, Kevin, Joerg Roedel, Christoph Hellwig,
	Ben Skeggs, Raj, Ashok, Will Deacon, Alex Williamson, Eric Auger,
	Liu, Yi L, Pan, Jacob jun, David Airlie, Daniel Vetter,
	Thierry Reding, Jonathan Hunter, iommu, linux-kernel

On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote:

> > > they are fundamentally different things in their own right, and the ideal
> > > API should give us the orthogonality to also bind a device to an SVA domain
> > > without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
> > > fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
> > 
> > Yes, these are orthogonal things. A iommu driver that supports PASID
> > ideally should support PASID enabled attach/detatch for every
> > iommu_domain type it supports.
> > 
> > SVA should not be entangled with PASID beyond that SVA is often used
> > with PASID - a SVA iommu_domain should be fully usable with a RID too.
> 
> The prototype of PASID enabled attach/detach ops could look like:
> 
>        int (*attach_dev_pasid)(struct iommu_domain *domain,
>                                struct device *dev, ioasid_t id);
>        void (*detach_dev_pasid)(struct iommu_domain *domain,
>                                 struct device *dev, ioasid_t id);

It seems reasonable and straightforward to me..

These would be domain ops?
 
> But the iommu driver should implement different callbacks for
> 
> 1) attaching an IOMMU DMA domain to a PASID on device;
>    - kernel DMA with PASID
>    - mdev-like device passthrough
>    - etc.
> 2) attaching a CPU-shared domain to a PASID on device;
>    - SVA
>    - guest PASID
>    - etc.

But this you mean domain->ops would be different? Seems fine, up to
the driver.

I'd hope to see some flow like:

 domain = device->bus->iommu_ops->alloc_sva_domain(dev)
 domain->ops->attach_dev_pasid(domain, dev, current->pasid)

To duplicate the current SVA APIs

Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-26 13:27           ` Jason Gunthorpe
@ 2022-01-26 14:00             ` Robin Murphy
  0 siblings, 0 replies; 53+ messages in thread
From: Robin Murphy @ 2022-01-26 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu
  Cc: Tian, Kevin, Raj, Ashok, David Airlie, Will Deacon, iommu,
	Jonathan Hunter, Christoph Hellwig, Alex Williamson,
	Thierry Reding, Ben Skeggs, Daniel Vetter, linux-kernel, Pan,
	Jacob jun

On 2022-01-26 13:27, Jason Gunthorpe via iommu wrote:
> On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote:
> 
>>>> they are fundamentally different things in their own right, and the ideal
>>>> API should give us the orthogonality to also bind a device to an SVA domain
>>>> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
>>>> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
>>>
>>> Yes, these are orthogonal things. A iommu driver that supports PASID
>>> ideally should support PASID enabled attach/detatch for every
>>> iommu_domain type it supports.
>>>
>>> SVA should not be entangled with PASID beyond that SVA is often used
>>> with PASID - a SVA iommu_domain should be fully usable with a RID too.
>>
>> The prototype of PASID enabled attach/detach ops could look like:
>>
>>         int (*attach_dev_pasid)(struct iommu_domain *domain,
>>                                 struct device *dev, ioasid_t id);
>>         void (*detach_dev_pasid)(struct iommu_domain *domain,
>>                                  struct device *dev, ioasid_t id);
> 
> It seems reasonable and straightforward to me..
> 
> These would be domain ops?
>   
>> But the iommu driver should implement different callbacks for
>>
>> 1) attaching an IOMMU DMA domain to a PASID on device;
>>     - kernel DMA with PASID
>>     - mdev-like device passthrough
>>     - etc.
>> 2) attaching a CPU-shared domain to a PASID on device;
>>     - SVA
>>     - guest PASID
>>     - etc.
> 
> But this you mean domain->ops would be different? Seems fine, up to
> the driver.

Indeed, it makes little practical difference whether the driver provides 
distinct sets of ops or just common callbacks which switch on the domain 
type internally. I expect that's largely going to come down to personal 
preference and how much internal driver code is common between the paths.

> I'd hope to see some flow like:
> 
>   domain = device->bus->iommu_ops->alloc_sva_domain(dev)
>   domain->ops->attach_dev_pasid(domain, dev, current->pasid)
> 
> To duplicate the current SVA APIs

+1 - I'd concur that attach/detach belong as domain ops in general. 
There's quite a nice logical split forming here where domain ops are the 
ones that make sense for external subsystems and endpoint drivers to use 
too, while device ops, with the sole exception of domain_alloc, are 
IOMMU API internals. By that reasoning, attach/bind/etc. are firmly in 
the former category.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] iommu cleanup and refactoring
  2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
                   ` (7 preceding siblings ...)
  2022-01-24  9:46 ` [PATCH 0/7] iommu cleanup and refactoring Tian, Kevin
@ 2022-02-08  1:32 ` Lu Baolu
  8 siblings, 0 replies; 53+ messages in thread
From: Lu Baolu @ 2022-02-08  1:32 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Kevin Tian, Ashok Raj, Will Deacon, Robin Murphy
  Cc: baolu.lu, Alex Williamson, Eric Auger, Liu Yi L, Jacob jun Pan,
	David Airlie, Daniel Vetter, Thierry Reding, Jonathan Hunter,
	iommu, linux-kernel

On 1/24/22 3:10 PM, Lu Baolu wrote:
> Hi,
> 
> The guest pasid and aux-domain related code are dead code in current
> iommu subtree. As we have reached a consensus that all these features
> should be based on the new iommufd framework (which is under active
> development), the first part of this series removes and cleanups all
> the dead code.
> 
> The second part of this series refactors the iommu_domain by moving all
> domain-specific ops from iommu_ops to a new domain_ops. This makes an
> iommu_domain self-contained and represent the abstraction of an I/O
> translation table in the IOMMU subsystem. With different type of
> iommu_domain providing different set of ops, it's easier to support more
> types of I/O translation tables.
> 
> Please help to review and comment.

Thank you all for the great comments. A new version of this series has
been posted.

https://lore.kernel.org/linux-iommu/20220208012559.1121729-1-baolu.lu@linux.intel.com/

Please check whether I missed anything.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2022-02-08  1:50 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-24  7:10 [PATCH 0/7] iommu cleanup and refactoring Lu Baolu
2022-01-24  7:10 ` [PATCH 1/7] iommu/vt-d: Remove guest pasid related callbacks Lu Baolu
2022-01-24  9:25   ` Christoph Hellwig
2022-01-24  7:10 ` [PATCH 2/7] iommu: Remove guest pasid related interfaces and definitions Lu Baolu
2022-01-24  9:26   ` Christoph Hellwig
2022-01-24  7:10 ` [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks Lu Baolu
2022-01-24  9:26   ` Christoph Hellwig
2022-01-24  7:10 ` [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops Lu Baolu
2022-01-24  9:27   ` Christoph Hellwig
2022-01-24  7:11 ` [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain Lu Baolu
2022-01-24  9:29   ` Christoph Hellwig
2022-01-25  2:59     ` Lu Baolu
2022-01-24  7:11 ` [PATCH 6/7] iommu: Use right way to retrieve iommu_ops Lu Baolu
2022-01-24  9:32   ` Christoph Hellwig
2022-01-25  3:01     ` Lu Baolu
2022-01-24  9:48   ` Tian, Kevin
2022-01-25  3:04     ` Lu Baolu
2022-01-24 17:36   ` Jason Gunthorpe
2022-01-25  1:11     ` Tian, Kevin
2022-01-25  1:52       ` Robin Murphy
2022-01-25  2:08         ` Tian, Kevin
2022-01-25  3:31           ` Lu Baolu
2022-01-25  3:18     ` Lu Baolu
2022-01-25  0:20   ` Robin Murphy
2022-01-25  3:54     ` Lu Baolu
2022-01-24  7:11 ` [PATCH 7/7] iommu: Add iommu_domain::domain_ops Lu Baolu
2022-01-24  9:37   ` Christoph Hellwig
2022-01-24 17:24     ` Jason Gunthorpe
2022-01-25  4:43       ` Lu Baolu
2022-01-25  4:42     ` Lu Baolu
2022-01-24  9:58   ` Tian, Kevin
2022-01-24 10:16     ` Jean-Philippe Brucker
2022-01-24 16:33       ` Jason Gunthorpe
2022-01-26  9:41         ` Jean-Philippe Brucker
2022-01-24 17:17     ` Jason Gunthorpe
2022-01-25  0:58       ` Tian, Kevin
2022-01-25  4:59     ` Lu Baolu
2022-01-25 12:37       ` Jason Gunthorpe
2022-01-24 17:55   ` Jason Gunthorpe
2022-01-25  5:04     ` Lu Baolu
2022-01-25  0:57   ` Robin Murphy
2022-01-25  6:27     ` Lu Baolu
2022-01-25 14:23       ` Robin Murphy
2022-01-25 15:00         ` Jason Gunthorpe
2022-01-24  9:46 ` [PATCH 0/7] iommu cleanup and refactoring Tian, Kevin
2022-01-24 17:44   ` Jason Gunthorpe
2022-01-25  1:11     ` Tian, Kevin
2022-01-25 14:48     ` Robin Murphy
2022-01-25 15:16       ` Jason Gunthorpe
2022-01-26  1:51         ` Lu Baolu
2022-01-26 13:27           ` Jason Gunthorpe
2022-01-26 14:00             ` Robin Murphy
2022-02-08  1:32 ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).