iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API
@ 2023-05-19 20:32 Jacob Pan
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Jacob Pan @ 2023-05-19 20:32 UTC (permalink / raw)
  To: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine, vkoul
  Cc: Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, Jacob Pan

Hi Joerg and all,

IDXD kernel work queues were disabled due to the flawed use of kernel VA
and SVA API.
Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/

The solution is to enable it under DMA API where IDXD shared workqueue users
can use ENQCMDS to submit work on buffers mapped by DMA API.

This patchset adds support for attaching PASID to the device's default
domain and the ability to allocate global PASIDs from IOMMU APIs. IDXD driver
can then re-enable the kernel work queues and use them under DMA API.

This depends on the IOASID removal series. (merged)
https://lore.kernel.org/all/ZCaUBJvUMsJyD7EW@8bytes.org/


Thanks,

Jacob

---
Changelog:
v6:
	- use a simplified version of vt-d driver change for set_device_pasid
	  from Baolu.
	- check and rename global PASID allocation base
v5:
	- exclude two patches related to supervisor mode, taken by VT-d
	maintainer Baolu.
	- move PASID range check into allocation API so that device drivers
	  only need to pass in struct device*. (Kevin)
	- factor out helper functions in device-domain attach (Baolu)
	- make explicit use of RID_PASID across architectures
v4:
	- move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
	- dropped domain type check while disabling idxd system PASID (Baolu)

v3:
	- moved global PASID allocation API from SVA to IOMMU (Kevin)
	- remove #ifdef around global PASID reservation during boot (Baolu)
	- remove restriction on PASID 0 allocation (Baolu)
	- fix a bug in sysfs domain change when attaching devices
	- clear idxd user interrupt enable bit after disabling device( Fenghua)
v2:
	- refactored device PASID attach domain ops based on Baolu's early patch
	- addressed TLB flush gap
	- explicitly reserve RID_PASID from SVA PASID number space
	- get dma domain directly, avoid checking domain types



Jacob Pan (3):
  iommu: Generalize default PCIe requester ID PASID
  iommu: Move global PASID allocation from SVA to core
  dmaengine/idxd: Re-enable kernel workqueue under DMA API

Lu Baolu (1):
  iommu/vt-d: Add set_dev_pasid callback for dma domain

 drivers/dma/idxd/device.c                     |  30 +---
 drivers/dma/idxd/dma.c                        |   5 +-
 drivers/dma/idxd/init.c                       |  60 ++++++-
 drivers/dma/idxd/sysfs.c                      |   7 -
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  10 +-
 drivers/iommu/intel/iommu.c                   | 159 +++++++++++++++---
 drivers/iommu/intel/iommu.h                   |   7 +
 drivers/iommu/intel/pasid.c                   |   2 +-
 drivers/iommu/intel/pasid.h                   |   1 -
 drivers/iommu/iommu-sva.c                     |  33 ++--
 drivers/iommu/iommu.c                         |  24 +++
 include/linux/iommu.h                         |  11 ++
 13 files changed, 265 insertions(+), 86 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID
  2023-05-19 20:32 [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API Jacob Pan
@ 2023-05-19 20:32 ` Jacob Pan
  2023-05-21  6:21   ` Baolu Lu
                     ` (2 more replies)
  2023-05-19 20:32 ` [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core Jacob Pan
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 17+ messages in thread
From: Jacob Pan @ 2023-05-19 20:32 UTC (permalink / raw)
  To: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine, vkoul
  Cc: Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, Jacob Pan

PCIe Process address space ID (PASID) is used to tag DMA traffic, it
provides finer grained isolation than requester ID (RID).

For each RID, 0 is as a special PASID for the legacy DMA (without
PASID), thus RID_PASID. This is universal across all architectures,
therefore warranted to be declared in the common header.
Noting that VT-d could support none-zero RID_PASID, but currently not
used.

By having a common RID_PASID, we can avoid conflicts between different
use cases in the generic code. e.g. SVA and DMA API with PASIDs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v6:
   - let SMMU code use the common RID_PASID macro
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++----
 drivers/iommu/intel/iommu.c                   | 24 +++++++++----------
 drivers/iommu/intel/pasid.c                   |  2 +-
 drivers/iommu/intel/pasid.h                   |  1 -
 include/linux/iommu.h                         |  1 +
 6 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..160b31e6239d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -80,7 +80,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	 * be some overlap between use of both ASIDs, until we invalidate the
 	 * TLB.
 	 */
-	arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
+	arm_smmu_write_ctx_desc(smmu_domain, IOMMU_DEF_RID_PASID, cd);
 
 	/* Invalidate TLB entries previously associated with that context */
 	arm_smmu_tlb_inv_asid(smmu, asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3fd83fb75722..9a6ea2d373c2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1053,7 +1053,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 	/*
 	 * This function handles the following cases:
 	 *
-	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
+	 * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_DEF_RID_PASID = 0).
 	 * (2) Install a secondary CD, for SID+SSID traffic.
 	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
 	 *     CD, then invalidate the old entry and mappings.
@@ -1869,7 +1869,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
 		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 	}
-	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
+	arm_smmu_atc_inv_domain(smmu_domain, IOMMU_DEF_RID_PASID, 0, 0);
 }
 
 static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
@@ -1957,7 +1957,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 	 * Unfortunately, this can't be leaf-only since we may have
 	 * zapped an entire table.
 	 */
-	arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+	arm_smmu_atc_inv_domain(smmu_domain, IOMMU_DEF_RID_PASID, iova, size);
 }
 
 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
@@ -2131,7 +2131,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	 * the master has been added to the devices list for this domain.
 	 * This isn't an issue because the STE hasn't been installed yet.
 	 */
-	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &cfg->cd);
+	ret = arm_smmu_write_ctx_desc(smmu_domain, IOMMU_DEF_RID_PASID, &cfg->cd);
 	if (ret)
 		goto out_free_cd_tables;
 
@@ -2317,7 +2317,7 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 	pdev = to_pci_dev(master->dev);
 
 	atomic_inc(&smmu_domain->nr_ats_masters);
-	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
+	arm_smmu_atc_inv_domain(smmu_domain, IOMMU_DEF_RID_PASID, 0, 0);
 	if (pci_enable_ats(pdev, stu))
 		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b871a6afd803..5b4ec7cfc1a4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -877,7 +877,7 @@ void dmar_fault_dump_ptes(struct intel_iommu *iommu, u16 source_id,
 	}
 	/* For request-without-pasid, get the pasid from context entry */
 	if (intel_iommu_sm && pasid == IOMMU_PASID_INVALID)
-		pasid = PASID_RID2PASID;
+		pasid = IOMMU_DEF_RID_PASID;
 
 	dir_index = pasid >> PASID_PDE_SHIFT;
 	pde = &dir[dir_index];
@@ -1457,7 +1457,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
 	qdep = info->ats_qdep;
 	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
 			   qdep, addr, mask);
-	quirk_extra_dev_tlb_flush(info, addr, mask, PASID_RID2PASID, qdep);
+	quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_DEF_RID_PASID, qdep);
 }
 
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
@@ -1492,7 +1492,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 		ih = 1 << 6;
 
 	if (domain->use_first_level) {
-		qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+		qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, addr, pages, ih);
 	} else {
 		unsigned long bitmask = aligned_pages - 1;
 
@@ -1562,7 +1562,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
 		u16 did = domain_id_iommu(dmar_domain, iommu);
 
 		if (dmar_domain->use_first_level)
-			qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
+			qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, 0, -1, 0);
 		else
 			iommu->flush.flush_iotlb(iommu, did, 0, 0,
 						 DMA_TLB_DSI_FLUSH);
@@ -1952,7 +1952,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 				context_pdts(pds);
 
 		/* Setup the RID_PASID field: */
-		context_set_sm_rid2pasid(context, PASID_RID2PASID);
+		context_set_sm_rid2pasid(context, IOMMU_DEF_RID_PASID);
 
 		/*
 		 * Setup the Device-TLB enable bit and Page request
@@ -2432,13 +2432,13 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
 		/* Setup the PASID entry for requests without PASID: */
 		if (hw_pass_through && domain_type_is_si(domain))
 			ret = intel_pasid_setup_pass_through(iommu, domain,
-					dev, PASID_RID2PASID);
+					dev, IOMMU_DEF_RID_PASID);
 		else if (domain->use_first_level)
 			ret = domain_setup_first_level(iommu, domain, dev,
-					PASID_RID2PASID);
+					IOMMU_DEF_RID_PASID);
 		else
 			ret = intel_pasid_setup_second_level(iommu, domain,
-					dev, PASID_RID2PASID);
+					dev, IOMMU_DEF_RID_PASID);
 		if (ret) {
 			dev_err(dev, "Setup RID2PASID failed\n");
 			device_block_translation(dev);
@@ -3975,7 +3975,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
 	if (!dev_is_real_dma_subdevice(info->dev)) {
 		if (dev_is_pci(info->dev) && sm_supported(iommu))
 			intel_pasid_tear_down_entry(iommu, info->dev,
-					PASID_RID2PASID, false);
+					IOMMU_DEF_RID_PASID, false);
 
 		iommu_disable_pci_caps(info);
 		domain_context_clear(info);
@@ -4004,7 +4004,7 @@ static void device_block_translation(struct device *dev)
 	if (!dev_is_real_dma_subdevice(dev)) {
 		if (sm_supported(iommu))
 			intel_pasid_tear_down_entry(iommu, dev,
-						    PASID_RID2PASID, false);
+						    IOMMU_DEF_RID_PASID, false);
 		else
 			domain_context_clear(info);
 	}
@@ -4339,7 +4339,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)
 
 	list_for_each_entry(info, &domain->devices, link)
 		intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
-						     PASID_RID2PASID);
+						     IOMMU_DEF_RID_PASID);
 }
 
 static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
@@ -4994,7 +4994,7 @@ void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
 		return;
 
 	sid = PCI_DEVID(info->bus, info->devfn);
-	if (pasid == PASID_RID2PASID) {
+	if (pasid == IOMMU_DEF_RID_PASID) {
 		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
 				   qdep, address, mask);
 	} else {
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..e8c60af8591b 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -438,7 +438,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	 * SVA usage, device could do DMA with multiple PASIDs. It is more
 	 * efficient to flush devTLB specific to the PASID.
 	 */
-	if (pasid == PASID_RID2PASID)
+	if (pasid == IOMMU_DEF_RID_PASID)
 		qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
 	else
 		qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..027d30afaba6 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -10,7 +10,6 @@
 #ifndef __INTEL_PASID_H
 #define __INTEL_PASID_H
 
-#define PASID_RID2PASID			0x0
 #define PASID_MIN			0x1
 #define PASID_MAX			0x100000
 #define PASID_PTE_MASK			0x3F
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e8c9a7da1060..d8327f2b6fcc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -190,6 +190,7 @@ enum iommu_dev_features {
 	IOMMU_DEV_FEAT_IOPF,
 };
 
+#define IOMMU_DEF_RID_PASID	(0U) /* Reserved for DMA w/o PASID */
 #define IOMMU_PASID_INVALID	(-1U)
 typedef unsigned int ioasid_t;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core
  2023-05-19 20:32 [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API Jacob Pan
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
@ 2023-05-19 20:32 ` Jacob Pan
  2023-05-21  6:21   ` Baolu Lu
  2023-05-29 19:43   ` Jason Gunthorpe
  2023-05-19 20:32 ` [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain Jacob Pan
  2023-05-19 20:32 ` [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API Jacob Pan
  3 siblings, 2 replies; 17+ messages in thread
From: Jacob Pan @ 2023-05-19 20:32 UTC (permalink / raw)
  To: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine, vkoul
  Cc: Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, Jacob Pan

Global PASID can be used beyond SVA. For example, drivers that use
Intel ENQCMD to submit work must use global PASIDs in that PASID
is stored in a per CPU MSR. When such device need to submit work
for in-kernel DMA with PASID, it must allocate PASIDs from the same
global number space to avoid conflict.

This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
Reserved PASIDs, currently only RID_PASID, are excluded from the global
PASID allocation.

It is expected that device drivers will use the allocated PASIDs to
attach to appropriate IOMMU domains for use.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v6: explicitly exclude reserved a range from SVA PASID allocation
    check mm PASID compatibility with device
v5: move PASID range check inside API so that device drivers only pass
    in struct device* (Kevin)
v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
---
 drivers/iommu/iommu-sva.c | 33 ++++++++++++++-------------------
 drivers/iommu/iommu.c     | 24 ++++++++++++++++++++++++
 include/linux/iommu.h     | 10 ++++++++++
 3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 9821bc44f5ac..7fe8e977d8eb 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -10,33 +10,33 @@
 #include "iommu-sva.h"
 
 static DEFINE_MUTEX(iommu_sva_lock);
-static DEFINE_IDA(iommu_global_pasid_ida);
 
 /* Allocate a PASID for the mm within range (inclusive) */
-static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev)
 {
+	ioasid_t pasid;
 	int ret = 0;
 
-	if (min == IOMMU_PASID_INVALID ||
-	    max == IOMMU_PASID_INVALID ||
-	    min == 0 || max < min)
-		return -EINVAL;
-
 	if (!arch_pgtable_dma_compat(mm))
 		return -EBUSY;
 
 	mutex_lock(&iommu_sva_lock);
 	/* Is a PASID already associated with this mm? */
 	if (mm_valid_pasid(mm)) {
-		if (mm->pasid < min || mm->pasid > max)
-			ret = -EOVERFLOW;
+		if (mm->pasid <= dev->iommu->max_pasids)
+			goto out;
+		dev_err(dev, "current mm PASID %d exceeds device range %d!",
+			mm->pasid, dev->iommu->max_pasids);
+		ret = -ERANGE;
 		goto out;
 	}
 
-	ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
-	if (ret < min)
+	pasid = iommu_alloc_global_pasid_dev(dev);
+	if (pasid == IOMMU_PASID_INVALID) {
+		ret = -ENOSPC;
 		goto out;
-	mm->pasid = ret;
+	}
+	mm->pasid = pasid;
 	ret = 0;
 out:
 	mutex_unlock(&iommu_sva_lock);
@@ -63,15 +63,10 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 {
 	struct iommu_domain *domain;
 	struct iommu_sva *handle;
-	ioasid_t max_pasids;
 	int ret;
 
-	max_pasids = dev->iommu->max_pasids;
-	if (!max_pasids)
-		return ERR_PTR(-EOPNOTSUPP);
-
 	/* Allocate mm->pasid if necessary. */
-	ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+	ret = iommu_sva_alloc_pasid(mm, dev);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -216,5 +211,5 @@ void mm_pasid_drop(struct mm_struct *mm)
 	if (likely(!mm_valid_pasid(mm)))
 		return;
 
-	ida_free(&iommu_global_pasid_ida, mm->pasid);
+	iommu_free_global_pasid(mm->pasid);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f1dcfa3f1a1b..786cb0f3acdf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@
 
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
+static DEFINE_IDA(iommu_global_pasid_ida);
 
 static unsigned int iommu_def_domain_type __read_mostly;
 static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
@@ -3393,3 +3394,26 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 
 	return domain;
 }
+
+ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
+{
+	int ret;
+	ioasid_t max;
+
+	max = dev->iommu->max_pasids;
+	ret = ida_alloc_range(&iommu_global_pasid_ida, IOMMU_FIRST_GLOBAL_PASID, max, GFP_KERNEL);
+	if (ret < 0)
+		return IOMMU_PASID_INVALID;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid_dev);
+
+void iommu_free_global_pasid(ioasid_t pasid)
+{
+	if (WARN_ON(pasid == IOMMU_PASID_INVALID))
+		return;
+
+	ida_free(&iommu_global_pasid_ida, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d8327f2b6fcc..fb4061c97d49 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -191,6 +191,7 @@ enum iommu_dev_features {
 };
 
 #define IOMMU_DEF_RID_PASID	(0U) /* Reserved for DMA w/o PASID */
+#define IOMMU_FIRST_GLOBAL_PASID	(1U) /*starting range for allocation */
 #define IOMMU_PASID_INVALID	(-1U)
 typedef unsigned int ioasid_t;
 
@@ -722,6 +723,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
 struct iommu_domain *
 iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
 			       unsigned int type);
+ioasid_t iommu_alloc_global_pasid_dev(struct device *dev);
+void iommu_free_global_pasid(ioasid_t pasid);
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1083,6 +1086,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
 {
 	return NULL;
 }
+
+static inline ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
+{
+	return IOMMU_PASID_INVALID;
+}
+
+static inline void iommu_free_global_pasid(ioasid_t pasid) {}
 #endif /* CONFIG_IOMMU_API */
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain
  2023-05-19 20:32 [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API Jacob Pan
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
  2023-05-19 20:32 ` [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core Jacob Pan
@ 2023-05-19 20:32 ` Jacob Pan
  2023-05-29 19:48   ` Jason Gunthorpe
  2023-05-19 20:32 ` [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API Jacob Pan
  3 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2023-05-19 20:32 UTC (permalink / raw)
  To: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine, vkoul
  Cc: Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

This allows the upper layers to set a domain to a PASID of a device
if the PASID feature is supported by the IOMMU hardware. The typical
use cases are, for example, kernel DMA with PASID and hardware
assisted mediated device drivers.

The attaching device and pasid information is tracked in a per-domain
list and is used for IOTLB and devTLB invalidation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 139 +++++++++++++++++++++++++++++++++---
 drivers/iommu/intel/iommu.h |   7 ++
 2 files changed, 135 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5b4ec7cfc1a4..f5f9ad8953cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1367,6 +1367,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,
 
 static void domain_update_iotlb(struct dmar_domain *domain)
 {
+	struct dev_pasid_info *dev_pasid;
 	struct device_domain_info *info;
 	bool has_iotlb_device = false;
 	unsigned long flags;
@@ -1378,6 +1379,14 @@ static void domain_update_iotlb(struct dmar_domain *domain)
 			break;
 		}
 	}
+
+	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+		info = dev_iommu_priv_get(dev_pasid->dev);
+		if (info->ats_enabled) {
+			has_iotlb_device = true;
+			break;
+		}
+	}
 	domain->has_iotlb_device = has_iotlb_device;
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
@@ -1463,6 +1472,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 				  u64 addr, unsigned mask)
 {
+	struct dev_pasid_info *dev_pasid;
 	struct device_domain_info *info;
 	unsigned long flags;
 
@@ -1472,6 +1482,37 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 	spin_lock_irqsave(&domain->lock, flags);
 	list_for_each_entry(info, &domain->devices, link)
 		__iommu_flush_dev_iotlb(info, addr, mask);
+
+	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+		info = dev_iommu_priv_get(dev_pasid->dev);
+		qi_flush_dev_iotlb_pasid(info->iommu,
+					 PCI_DEVID(info->bus, info->devfn),
+					 info->pfsid, dev_pasid->pasid,
+					 info->ats_qdep, addr,
+					 mask);
+	}
+	spin_unlock_irqrestore(&domain->lock, flags);
+}
+
+/*
+ * The VT-d spec requires to use PASID-based-IOTLB Invalidation to
+ * invalidate IOTLB and the paging-structure-caches for a first-stage
+ * page table.
+ */
+static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
+				     struct dmar_domain *domain, u64 addr,
+				     unsigned long npages, bool ih)
+{
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dev_pasid_info *dev_pasid;
+	unsigned long flags;
+
+	spin_lock_irqsave(&domain->lock, flags);
+	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain)
+		qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih);
+
+	if (!list_empty(&domain->devices))
+		qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, addr, npages, ih);
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
 
@@ -1492,7 +1533,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 		ih = 1 << 6;
 
 	if (domain->use_first_level) {
-		qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, addr, pages, ih);
+		domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
 	} else {
 		unsigned long bitmask = aligned_pages - 1;
 
@@ -1562,7 +1603,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
 		u16 did = domain_id_iommu(dmar_domain, iommu);
 
 		if (dmar_domain->use_first_level)
-			qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, 0, -1, 0);
+			domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0);
 		else
 			iommu->flush.flush_iotlb(iommu, did, 0, 0,
 						 DMA_TLB_DSI_FLUSH);
@@ -1734,6 +1775,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 		domain->use_first_level = true;
 	domain->has_iotlb_device = false;
 	INIT_LIST_HEAD(&domain->devices);
+	INIT_LIST_HEAD(&domain->dev_pasids);
 	spin_lock_init(&domain->lock);
 	xa_init(&domain->iommu_array);
 
@@ -4720,25 +4762,99 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 {
 	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+	struct dev_pasid_info *curr, *dev_pasid = NULL;
+	struct dmar_domain *dmar_domain;
 	struct iommu_domain *domain;
+	unsigned long flags;
 
-	/* Domain type specific cleanup: */
 	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
-	if (domain) {
-		switch (domain->type) {
-		case IOMMU_DOMAIN_SVA:
-			intel_svm_remove_dev_pasid(dev, pasid);
-			break;
-		default:
-			/* should never reach here */
-			WARN_ON(1);
+	if (!domain)
+		goto out_tear_down;
+
+	/*
+	 * The SVA implementation needs to stop mm notification, drain the
+	 * pending page fault requests before tearing down the pasid entry.
+	 * The VT-d spec (section 6.2.3.1) also recommends that software
+	 * could use a reserved domain id for all first-only and pass-through
+	 * translations. Hence there's no need to call domain_detach_iommu()
+	 * in the sva domain case.
+	 */
+	if (domain->type == IOMMU_DOMAIN_SVA) {
+		intel_svm_remove_dev_pasid(dev, pasid);
+		goto out_tear_down;
+	}
+
+	dmar_domain = to_dmar_domain(domain);
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) {
+		if (curr->dev == dev && curr->pasid == pasid) {
+			list_del(&curr->link_domain);
+			dev_pasid = curr;
 			break;
 		}
 	}
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
+	domain_detach_iommu(dmar_domain, iommu);
+	kfree(dev_pasid);
+out_tear_down:
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
+				     struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_iommu *iommu = info->iommu;
+	struct dev_pasid_info *dev_pasid;
+	unsigned long flags;
+	int ret;
+
+	if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
+		return -EOPNOTSUPP;
+
+	if (context_copied(iommu, info->bus, info->devfn))
+		return -EBUSY;
+
+	ret = prepare_domain_attach_device(domain, dev);
+	if (ret)
+		return ret;
+
+	dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+	if (!dev_pasid)
+		return -ENOMEM;
+
+	ret = domain_attach_iommu(dmar_domain, iommu);
+	if (ret)
+		goto out_free;
+
+	if (domain_type_is_si(dmar_domain))
+		ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
+						     dev, pasid);
+	else if (dmar_domain->use_first_level)
+		ret = domain_setup_first_level(iommu, dmar_domain,
+					       dev, pasid);
+	else
+		ret = intel_pasid_setup_second_level(iommu, dmar_domain,
+						     dev, pasid);
+	if (ret)
+		goto out_detach_iommu;
+
+	dev_pasid->dev = dev;
+	dev_pasid->pasid = pasid;
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+
+	return 0;
+out_detach_iommu:
+	domain_detach_iommu(dmar_domain, iommu);
+out_free:
+	kfree(dev_pasid);
+	return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
@@ -4758,6 +4874,7 @@ const struct iommu_ops intel_iommu_ops = {
 #endif
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev		= intel_iommu_attach_device,
+		.set_dev_pasid		= intel_iommu_set_dev_pasid,
 		.map_pages		= intel_iommu_map_pages,
 		.unmap_pages		= intel_iommu_unmap_pages,
 		.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1c5e1d88862b..30c30e00fbdf 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -595,6 +595,7 @@ struct dmar_domain {
 
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
+	struct list_head dev_pasids;	/* all attached pasids */
 
 	struct dma_pte	*pgd;		/* virtual address */
 	int		gaw;		/* max guest address width */
@@ -717,6 +718,12 @@ struct device_domain_info {
 	struct pasid_table *pasid_table; /* pasid table */
 };
 
+struct dev_pasid_info {
+	struct list_head link_domain;	/* link to domain siblings */
+	struct device *dev;		/* the physical device */
+	ioasid_t pasid;			/* PASID of the physical device */
+};
+
 static inline void __iommu_flush_cache(
 	struct intel_iommu *iommu, void *addr, int size)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API
  2023-05-19 20:32 [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API Jacob Pan
                   ` (2 preceding siblings ...)
  2023-05-19 20:32 ` [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain Jacob Pan
@ 2023-05-19 20:32 ` Jacob Pan
  2023-05-21  6:29   ` Baolu Lu
  3 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2023-05-19 20:32 UTC (permalink / raw)
  To: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine, vkoul
  Cc: Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, Jacob Pan

Kernel workqueues were disabled due to flawed use of kernel VA and SVA
API. Now that we have the support for attaching PASID to the device's
default domain and the ability to reserve global PASIDs from SVA APIs,
we can re-enable the kernel work queues and use them under DMA API.

We also use non-privileged access for in-kernel DMA to be consistent
with the IOMMU settings. Consequently, interrupt for user privilege is
enabled for work completion IRQs.

Link:https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/dma/idxd/device.c | 30 ++++----------------
 drivers/dma/idxd/dma.c    |  5 ++--
 drivers/dma/idxd/init.c   | 60 ++++++++++++++++++++++++++++++++++++---
 drivers/dma/idxd/sysfs.c  |  7 -----
 4 files changed, 64 insertions(+), 38 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 5abbcc61c528..66b6665a45cb 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd)
 	}
 }
 
-static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
-{
-	struct idxd_device *idxd = wq->idxd;
-	union wqcfg wqcfg;
-	unsigned int offset;
-
-	offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
-	spin_lock(&idxd->dev_lock);
-	wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base + offset);
-	wqcfg.priv = priv;
-	wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
-	iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base + offset);
-	spin_unlock(&idxd->dev_lock);
-}
-
 static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
 {
 	struct idxd_device *idxd = wq->idxd;
@@ -1423,15 +1408,14 @@ int drv_enable_wq(struct idxd_wq *wq)
 	}
 
 	/*
-	 * In the event that the WQ is configurable for pasid and priv bits.
-	 * For kernel wq, the driver should setup the pasid, pasid_en, and priv bit.
-	 * However, for non-kernel wq, the driver should only set the pasid_en bit for
-	 * shared wq. A dedicated wq that is not 'kernel' type will configure pasid and
+	 * In the event that the WQ is configurable for pasid, the driver
+	 * should setup the pasid, pasid_en bit. This is true for both kernel
+	 * and user shared workqueues. There is no need to setup priv bit in
+	 * that in-kernel DMA will also do user privileged requests.
+	 * A dedicated wq that is not 'kernel' type will configure pasid and
 	 * pasid_en later on so there is no need to setup.
 	 */
 	if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
-		int priv = 0;
-
 		if (wq_pasid_enabled(wq)) {
 			if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
 				u32 pasid = wq_dedicated(wq) ? idxd->pasid : 0;
@@ -1439,10 +1423,6 @@ int drv_enable_wq(struct idxd_wq *wq)
 				__idxd_wq_set_pasid_locked(wq, pasid);
 			}
 		}
-
-		if (is_idxd_wq_kernel(wq))
-			priv = 1;
-		__idxd_wq_set_priv_locked(wq, priv);
 	}
 
 	rc = 0;
diff --git a/drivers/dma/idxd/dma.c b/drivers/dma/idxd/dma.c
index eb35ca313684..07623fb0f52f 100644
--- a/drivers/dma/idxd/dma.c
+++ b/drivers/dma/idxd/dma.c
@@ -75,9 +75,10 @@ static inline void idxd_prep_desc_common(struct idxd_wq *wq,
 	hw->xfer_size = len;
 	/*
 	 * For dedicated WQ, this field is ignored and HW will use the WQCFG.priv
-	 * field instead. This field should be set to 1 for kernel descriptors.
+	 * field instead. This field should be set to 0 for kernel descriptors
+	 * since kernel DMA on VT-d supports "user" privilege only.
 	 */
-	hw->priv = 1;
+	hw->priv = 0;
 	hw->completion_addr = compl;
 }
 
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 1aa823974cda..bd7b9bd40f0a 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -550,14 +550,65 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-	return -EOPNOTSUPP;
+	struct pci_dev *pdev = idxd->pdev;
+	struct device *dev = &pdev->dev;
+	struct iommu_domain *domain;
+	union gencfg_reg gencfg;
+	ioasid_t pasid;
+	int ret;
+
+	/*
+	 * Attach a global PASID to the DMA domain so that we can use ENQCMDS
+	 * to submit work on buffers mapped by DMA API.
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EPERM;
+
+	pasid = iommu_alloc_global_pasid_dev(dev);
+	if (pasid == IOMMU_PASID_INVALID)
+		return -ENOSPC;
+
+	/*
+	 * DMA domain is owned by the driver, it should support all valid
+	 * types such as DMA-FQ, identity, etc.
+	 */
+	ret = iommu_attach_device_pasid(domain, dev, pasid);
+	if (ret) {
+		dev_err(dev, "failed to attach device pasid %d, domain type %d",
+			pasid, domain->type);
+		iommu_free_global_pasid(pasid);
+		return ret;
+	}
+
+	/* Since we set user privilege for kernel DMA, enable completion IRQ */
+	gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+	gencfg.user_int_en = 1;
+	iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+	idxd->pasid = pasid;
+
+	return ret;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
+	struct pci_dev *pdev = idxd->pdev;
+	struct device *dev = &pdev->dev;
+	struct iommu_domain *domain;
+	union gencfg_reg gencfg;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return;
+
+	iommu_detach_device_pasid(domain, dev, idxd->pasid);
+	iommu_free_global_pasid(idxd->pasid);
 
-	iommu_sva_unbind_device(idxd->sva);
+	gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+	gencfg.user_int_en = 0;
+	iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
 	idxd->sva = NULL;
+	idxd->pasid = IOMMU_PASID_INVALID;
 }
 
 static int idxd_enable_sva(struct pci_dev *pdev)
@@ -600,8 +651,9 @@ static int idxd_probe(struct idxd_device *idxd)
 		} else {
 			set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);
 
-			if (idxd_enable_system_pasid(idxd))
-				dev_warn(dev, "No in-kernel DMA with PASID.\n");
+			rc = idxd_enable_system_pasid(idxd);
+			if (rc)
+				dev_warn(dev, "No in-kernel DMA with PASID. %d\n", rc);
 			else
 				set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
 		}
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 293739ac5596..63f6966c51aa 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -948,13 +948,6 @@ static ssize_t wq_name_store(struct device *dev,
 	if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
 		return -EINVAL;
 
-	/*
-	 * This is temporarily placed here until we have SVM support for
-	 * dmaengine.
-	 */
-	if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
-		return -EOPNOTSUPP;
-
 	input = kstrndup(buf, count, GFP_KERNEL);
 	if (!input)
 		return -ENOMEM;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core
  2023-05-19 20:32 ` [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core Jacob Pan
@ 2023-05-21  6:21   ` Baolu Lu
  2023-05-22 17:32     ` Jacob Pan
  2023-05-29 19:43   ` Jason Gunthorpe
  1 sibling, 1 reply; 17+ messages in thread
From: Baolu Lu @ 2023-05-21  6:21 UTC (permalink / raw)
  To: Jacob Pan, LKML, iommu, Jason Gunthorpe, Joerg Roedel, dmaengine, vkoul
  Cc: baolu.lu, Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok,
	Tian, Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi,
	Tom, narayan.ranganathan

On 5/20/23 4:32 AM, Jacob Pan wrote:
> Global PASID can be used beyond SVA. For example, drivers that use
> Intel ENQCMD to submit work must use global PASIDs in that PASID
> is stored in a per CPU MSR. When such device need to submit work
> for in-kernel DMA with PASID, it must allocate PASIDs from the same
> global number space to avoid conflict.
> 
> This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
> Reserved PASIDs, currently only RID_PASID, are excluded from the global
> PASID allocation.
> 
> It is expected that device drivers will use the allocated PASIDs to
> attach to appropriate IOMMU domains for use.
> 
> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> ---
> v6: explicitly exclude reserved a range from SVA PASID allocation
>      check mm PASID compatibility with device
> v5: move PASID range check inside API so that device drivers only pass
>      in struct device* (Kevin)
> v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
> ---
>   drivers/iommu/iommu-sva.c | 33 ++++++++++++++-------------------
>   drivers/iommu/iommu.c     | 24 ++++++++++++++++++++++++
>   include/linux/iommu.h     | 10 ++++++++++
>   3 files changed, 48 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 9821bc44f5ac..7fe8e977d8eb 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -10,33 +10,33 @@
>   #include "iommu-sva.h"
>   
>   static DEFINE_MUTEX(iommu_sva_lock);
> -static DEFINE_IDA(iommu_global_pasid_ida);
>   
>   /* Allocate a PASID for the mm within range (inclusive) */
> -static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> +static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev)
>   {
> +	ioasid_t pasid;
>   	int ret = 0;
>   
> -	if (min == IOMMU_PASID_INVALID ||
> -	    max == IOMMU_PASID_INVALID ||
> -	    min == 0 || max < min)
> -		return -EINVAL;
> -
>   	if (!arch_pgtable_dma_compat(mm))
>   		return -EBUSY;
>   
>   	mutex_lock(&iommu_sva_lock);
>   	/* Is a PASID already associated with this mm? */
>   	if (mm_valid_pasid(mm)) {
> -		if (mm->pasid < min || mm->pasid > max)
> -			ret = -EOVERFLOW;
> +		if (mm->pasid <= dev->iommu->max_pasids)
> +			goto out;
> +		dev_err(dev, "current mm PASID %d exceeds device range %d!",
> +			mm->pasid, dev->iommu->max_pasids);
> +		ret = -ERANGE;
>   		goto out;
>   	}

Nit: Above is just refactoring, so it's better to keep the code behavior
consistent. For example, no need to change the error# from -EOVERFLOW to
-ERANGE, and no need to leave a new kernel message.

Anyway, if you think these changes are helpful, it's better to have them
in separated patches.

In the end, perhaps we can simply have code like this:

	if (mm_valid_pasid(mm)) {
		if (mm->pasid > dev->iommu->max_pasids)
			ret = -EOVERFLOW;
		goto out;
	}

Others look good to me, with above addressed,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
@ 2023-05-21  6:21   ` Baolu Lu
  2023-05-23 14:47   ` Jean-Philippe Brucker
  2023-05-29 19:50   ` Jason Gunthorpe
  2 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-05-21  6:21 UTC (permalink / raw)
  To: Jacob Pan, LKML, iommu, Jason Gunthorpe, Joerg Roedel, dmaengine, vkoul
  Cc: baolu.lu, Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok,
	Tian, Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi,
	Tom, narayan.ranganathan

On 5/20/23 4:32 AM, Jacob Pan wrote:
> PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> provides finer grained isolation than requester ID (RID).
> 
> For each RID, 0 is as a special PASID for the legacy DMA (without
> PASID), thus RID_PASID. This is universal across all architectures,
> therefore warranted to be declared in the common header.
> Noting that VT-d could support none-zero RID_PASID, but currently not
> used.
> 
> By having a common RID_PASID, we can avoid conflicts between different
> use cases in the generic code. e.g. SVA and DMA API with PASIDs.
> 
> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API
  2023-05-19 20:32 ` [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API Jacob Pan
@ 2023-05-21  6:29   ` Baolu Lu
  0 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-05-21  6:29 UTC (permalink / raw)
  To: Jacob Pan, LKML, iommu, Jason Gunthorpe, Joerg Roedel, dmaengine, vkoul
  Cc: baolu.lu, Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok,
	Tian, Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi,
	Tom, narayan.ranganathan

On 5/20/23 4:32 AM, Jacob Pan wrote:
> Kernel workqueues were disabled due to flawed use of kernel VA and SVA
> API. Now that we have the support for attaching PASID to the device's
> default domain and the ability to reserve global PASIDs from SVA APIs,
> we can re-enable the kernel work queues and use them under DMA API.
> 
> We also use non-privileged access for in-kernel DMA to be consistent
> with the IOMMU settings. Consequently, interrupt for user privilege is
> enabled for work completion IRQs.
> 
> Link:https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
> Reviewed-by: Dave Jiang<dave.jiang@intel.com>
> Reviewed-by: Fenghua Yu<fenghua.yu@intel.com>
> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core
  2023-05-21  6:21   ` Baolu Lu
@ 2023-05-22 17:32     ` Jacob Pan
  0 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2023-05-22 17:32 UTC (permalink / raw)
  To: Baolu Lu
  Cc: LKML, iommu, Jason Gunthorpe, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan, jacob.jun.pan

Hi Baolu,

On Sun, 21 May 2023 14:21:25 +0800, Baolu Lu <baolu.lu@linux.intel.com>
wrote:

> On 5/20/23 4:32 AM, Jacob Pan wrote:
> > Global PASID can be used beyond SVA. For example, drivers that use
> > Intel ENQCMD to submit work must use global PASIDs in that PASID
> > is stored in a per CPU MSR. When such device need to submit work
> > for in-kernel DMA with PASID, it must allocate PASIDs from the same
> > global number space to avoid conflict.
> > 
> > This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
> > Reserved PASIDs, currently only RID_PASID, are excluded from the global
> > PASID allocation.
> > 
> > It is expected that device drivers will use the allocated PASIDs to
> > attach to appropriate IOMMU domains for use.
> > 
> > Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> > ---
> > v6: explicitly exclude reserved a range from SVA PASID allocation
> >      check mm PASID compatibility with device
> > v5: move PASID range check inside API so that device drivers only pass
> >      in struct device* (Kevin)
> > v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
> > ---
> >   drivers/iommu/iommu-sva.c | 33 ++++++++++++++-------------------
> >   drivers/iommu/iommu.c     | 24 ++++++++++++++++++++++++
> >   include/linux/iommu.h     | 10 ++++++++++
> >   3 files changed, 48 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> > index 9821bc44f5ac..7fe8e977d8eb 100644
> > --- a/drivers/iommu/iommu-sva.c
> > +++ b/drivers/iommu/iommu-sva.c
> > @@ -10,33 +10,33 @@
> >   #include "iommu-sva.h"
> >   
> >   static DEFINE_MUTEX(iommu_sva_lock);
> > -static DEFINE_IDA(iommu_global_pasid_ida);
> >   
> >   /* Allocate a PASID for the mm within range (inclusive) */
> > -static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min,
> > ioasid_t max) +static int iommu_sva_alloc_pasid(struct mm_struct *mm,
> > struct device *dev) {
> > +	ioasid_t pasid;
> >   	int ret = 0;
> >   
> > -	if (min == IOMMU_PASID_INVALID ||
> > -	    max == IOMMU_PASID_INVALID ||
> > -	    min == 0 || max < min)
> > -		return -EINVAL;
> > -
> >   	if (!arch_pgtable_dma_compat(mm))
> >   		return -EBUSY;
> >   
> >   	mutex_lock(&iommu_sva_lock);
> >   	/* Is a PASID already associated with this mm? */
> >   	if (mm_valid_pasid(mm)) {
> > -		if (mm->pasid < min || mm->pasid > max)
> > -			ret = -EOVERFLOW;
> > +		if (mm->pasid <= dev->iommu->max_pasids)
> > +			goto out;
> > +		dev_err(dev, "current mm PASID %d exceeds device range
> > %d!",
> > +			mm->pasid, dev->iommu->max_pasids);
> > +		ret = -ERANGE;
> >   		goto out;
> >   	}  
> 
> Nit: Above is just refactoring, so it's better to keep the code behavior
> consistent. For example, no need to change the error# from -EOVERFLOW to
> -ERANGE, and no need to leave a new kernel message.
> 
> Anyway, if you think these changes are helpful, it's better to have them
> in separated patches.
> 
> In the end, perhaps we can simply have code like this:
> 
> 	if (mm_valid_pasid(mm)) {
> 		if (mm->pasid > dev->iommu->max_pasids)
> 			ret = -EOVERFLOW;
> 		goto out;
> 	}
> 
> Others look good to me, with above addressed,
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> 
much better, will fix.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
  2023-05-21  6:21   ` Baolu Lu
@ 2023-05-23 14:47   ` Jean-Philippe Brucker
  2023-05-23 15:26     ` Jacob Pan
  2023-05-29 19:50   ` Jason Gunthorpe
  2 siblings, 1 reply; 17+ messages in thread
From: Jean-Philippe Brucker @ 2023-05-23 14:47 UTC (permalink / raw)
  To: Jacob Pan
  Cc: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine,
	vkoul, Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok,
	Tian, Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi,
	Tom, narayan.ranganathan

Hi Jacob,

On Fri, May 19, 2023 at 01:32:20PM -0700, Jacob Pan wrote:
> PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> provides finer grained isolation than requester ID (RID).
> 
> For each RID, 0 is as a special PASID for the legacy DMA (without
> PASID), thus RID_PASID. This is universal across all architectures,
> therefore warranted to be declared in the common header.
> Noting that VT-d could support none-zero RID_PASID, but currently not
> used.
> 
> By having a common RID_PASID, we can avoid conflicts between different
> use cases in the generic code. e.g. SVA and DMA API with PASIDs.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v6:
>    - let SMMU code use the common RID_PASID macro
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++----
>  drivers/iommu/intel/iommu.c                   | 24 +++++++++----------
>  drivers/iommu/intel/pasid.c                   |  2 +-
>  drivers/iommu/intel/pasid.h                   |  1 -
>  include/linux/iommu.h                         |  1 +
>  6 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index a5a63b1c947e..160b31e6239d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -80,7 +80,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>  	 * be some overlap between use of both ASIDs, until we invalidate the
>  	 * TLB.
>  	 */
> -	arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
> +	arm_smmu_write_ctx_desc(smmu_domain, IOMMU_DEF_RID_PASID, cd);

I agree with reserving 0 globally for non-PASID DMA, but could we call
this something more generic, like IOMMU_NO_PASID?  The term "RID_PASID" is
specific to VT-d and "RID" to PCI, so it looks confusing here (this driver
also supports non-PCI). "NO_PASID" would be clearer to someone just trying
to follow this driver code.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID
  2023-05-23 14:47   ` Jean-Philippe Brucker
@ 2023-05-23 15:26     ` Jacob Pan
  0 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2023-05-23 15:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: LKML, iommu, Jason Gunthorpe, Lu Baolu, Joerg Roedel, dmaengine,
	vkoul, Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok,
	Tian, Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi,
	Tom, narayan.ranganathan, jacob.jun.pan

Hi Jean,

On Tue, 23 May 2023 15:47:33 +0100, Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:

> Hi Jacob,
> 
> On Fri, May 19, 2023 at 01:32:20PM -0700, Jacob Pan wrote:
> > PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> > provides finer grained isolation than requester ID (RID).
> > 
> > For each RID, 0 is as a special PASID for the legacy DMA (without
> > PASID), thus RID_PASID. This is universal across all architectures,
> > therefore warranted to be declared in the common header.
> > Noting that VT-d could support none-zero RID_PASID, but currently not
> > used.
> > 
> > By having a common RID_PASID, we can avoid conflicts between different
> > use cases in the generic code. e.g. SVA and DMA API with PASIDs.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v6:
> >    - let SMMU code use the common RID_PASID macro
> > ---
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++----
> >  drivers/iommu/intel/iommu.c                   | 24 +++++++++----------
> >  drivers/iommu/intel/pasid.c                   |  2 +-
> >  drivers/iommu/intel/pasid.h                   |  1 -
> >  include/linux/iommu.h                         |  1 +
> >  6 files changed, 20 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index
> > a5a63b1c947e..160b31e6239d 100644 ---
> > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -80,7 +80,7 @@
> > arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
> >  	 * be some overlap between use of both ASIDs, until we
> > invalidate the
> >  	 * TLB.
> >  	 */
> > -	arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
> > +	arm_smmu_write_ctx_desc(smmu_domain, IOMMU_DEF_RID_PASID, cd);
> >  
> 
> I agree with reserving 0 globally for non-PASID DMA, but could we call
> this something more generic, like IOMMU_NO_PASID?  The term "RID_PASID" is
> specific to VT-d and "RID" to PCI, so it looks confusing here (this driver
> also supports non-PCI). "NO_PASID" would be clearer to someone just trying
> to follow this driver code.
> 
Sounds good, it is for DMA w/o PASID.


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core
  2023-05-19 20:32 ` [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core Jacob Pan
  2023-05-21  6:21   ` Baolu Lu
@ 2023-05-29 19:43   ` Jason Gunthorpe
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 19:43 UTC (permalink / raw)
  To: Jacob Pan
  Cc: LKML, iommu, Lu Baolu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On Fri, May 19, 2023 at 01:32:21PM -0700, Jacob Pan wrote:
> Global PASID can be used beyond SVA. For example, drivers that use
> Intel ENQCMD to submit work must use global PASIDs in that PASID
> is stored in a per CPU MSR. When such device need to submit work
> for in-kernel DMA with PASID, it must allocate PASIDs from the same
> global number space to avoid conflict.
> 
> This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
> Reserved PASIDs, currently only RID_PASID, are excluded from the global
> PASID allocation.
> 
> It is expected that device drivers will use the allocated PASIDs to
> attach to appropriate IOMMU domains for use.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v6: explicitly exclude reserved a range from SVA PASID allocation
>     check mm PASID compatibility with device
> v5: move PASID range check inside API so that device drivers only pass
>     in struct device* (Kevin)
> v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
> ---
>  drivers/iommu/iommu-sva.c | 33 ++++++++++++++-------------------
>  drivers/iommu/iommu.c     | 24 ++++++++++++++++++++++++
>  include/linux/iommu.h     | 10 ++++++++++
>  3 files changed, 48 insertions(+), 19 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain
  2023-05-19 20:32 ` [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain Jacob Pan
@ 2023-05-29 19:48   ` Jason Gunthorpe
  2023-05-30  2:19     ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 19:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: LKML, iommu, Lu Baolu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On Fri, May 19, 2023 at 01:32:22PM -0700, Jacob Pan wrote:

> @@ -4720,25 +4762,99 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
>  static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>  {
>  	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> +	struct dev_pasid_info *curr, *dev_pasid = NULL;
> +	struct dmar_domain *dmar_domain;
>  	struct iommu_domain *domain;
> +	unsigned long flags;
>  
> -	/* Domain type specific cleanup: */
>  	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> -	if (domain) {
> -		switch (domain->type) {
> -		case IOMMU_DOMAIN_SVA:
> -			intel_svm_remove_dev_pasid(dev, pasid);
> -			break;
> -		default:
> -			/* should never reach here */
> -			WARN_ON(1);
> +	if (!domain)
> +		goto out_tear_down;
> +
> +	/*
> +	 * The SVA implementation needs to stop mm notification, drain the
> +	 * pending page fault requests before tearing down the pasid entry.
> +	 * The VT-d spec (section 6.2.3.1) also recommends that software
> +	 * could use a reserved domain id for all first-only and pass-through
> +	 * translations. Hence there's no need to call domain_detach_iommu()
> +	 * in the sva domain case.
> +	 */
> +	if (domain->type == IOMMU_DOMAIN_SVA) {
> +		intel_svm_remove_dev_pasid(dev, pasid);
> +		goto out_tear_down;
> +	}

But why don't you need to do all the other
intel_pasid_tear_down_entry(), intel_svm_drain_prq() (which is
misnamed) and other stuff from intel_svm_remove_dev_pasid() ?

There still seems to be waaay too much "SVM" in the PASID code.

> +static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
> +				     struct device *dev, ioasid_t pasid)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct intel_iommu *iommu = info->iommu;
> +	struct dev_pasid_info *dev_pasid;
> +	unsigned long flags;
> +	int ret;
> +
> +	if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
> +		return -EOPNOTSUPP;
> +
> +	if (context_copied(iommu, info->bus, info->devfn))
> +		return -EBUSY;
> +
> +	ret = prepare_domain_attach_device(domain, dev);
> +	if (ret)
> +		return ret;
> +
> +	dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> +	if (!dev_pasid)
> +		return -ENOMEM;
> +
> +	ret = domain_attach_iommu(dmar_domain, iommu);
> +	if (ret)
> +		goto out_free;
> +
> +	if (domain_type_is_si(dmar_domain))
> +		ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
> +						     dev, pasid);
> +	else if (dmar_domain->use_first_level)
> +		ret = domain_setup_first_level(iommu, dmar_domain,
> +					       dev, pasid);
> +	else
> +		ret = intel_pasid_setup_second_level(iommu, dmar_domain,
> +						     dev, pasid);

It would be nice if the different domain types had their own ops..

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID
  2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
  2023-05-21  6:21   ` Baolu Lu
  2023-05-23 14:47   ` Jean-Philippe Brucker
@ 2023-05-29 19:50   ` Jason Gunthorpe
  2 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 19:50 UTC (permalink / raw)
  To: Jacob Pan
  Cc: LKML, iommu, Lu Baolu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On Fri, May 19, 2023 at 01:32:20PM -0700, Jacob Pan wrote:
> PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> provides finer grained isolation than requester ID (RID).
> 
> For each RID, 0 is as a special PASID for the legacy DMA (without
> PASID), thus RID_PASID. This is universal across all architectures,
> therefore warranted to be declared in the common header.
> Noting that VT-d could support none-zero RID_PASID, but currently not
> used.
> 
> By having a common RID_PASID, we can avoid conflicts between different
> use cases in the generic code. e.g. SVA and DMA API with PASIDs.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v6:
>    - let SMMU code use the common RID_PASID macro
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++----
>  drivers/iommu/intel/iommu.c                   | 24 +++++++++----------
>  drivers/iommu/intel/pasid.c                   |  2 +-
>  drivers/iommu/intel/pasid.h                   |  1 -
>  include/linux/iommu.h                         |  1 +
>  6 files changed, 20 insertions(+), 20 deletions(-)

I tend to agree with Jean-Philippe's remarks on the name

This is the "untagged" DMA for environments that have optional tags.

Otherwise:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain
  2023-05-29 19:48   ` Jason Gunthorpe
@ 2023-05-30  2:19     ` Baolu Lu
  2023-05-30 16:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Baolu Lu @ 2023-05-30  2:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Jacob Pan
  Cc: baolu.lu, LKML, iommu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On 5/30/23 3:48 AM, Jason Gunthorpe wrote:
> On Fri, May 19, 2023 at 01:32:22PM -0700, Jacob Pan wrote:
> 
>> @@ -4720,25 +4762,99 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
>>   static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>>   {
>>   	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>> +	struct dev_pasid_info *curr, *dev_pasid = NULL;
>> +	struct dmar_domain *dmar_domain;
>>   	struct iommu_domain *domain;
>> +	unsigned long flags;
>>   
>> -	/* Domain type specific cleanup: */
>>   	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
>> -	if (domain) {
>> -		switch (domain->type) {
>> -		case IOMMU_DOMAIN_SVA:
>> -			intel_svm_remove_dev_pasid(dev, pasid);
>> -			break;
>> -		default:
>> -			/* should never reach here */
>> -			WARN_ON(1);
>> +	if (!domain)
>> +		goto out_tear_down;
>> +
>> +	/*
>> +	 * The SVA implementation needs to stop mm notification, drain the
>> +	 * pending page fault requests before tearing down the pasid entry.
>> +	 * The VT-d spec (section 6.2.3.1) also recommends that software
>> +	 * could use a reserved domain id for all first-only and pass-through
>> +	 * translations. Hence there's no need to call domain_detach_iommu()
>> +	 * in the sva domain case.
>> +	 */
>> +	if (domain->type == IOMMU_DOMAIN_SVA) {
>> +		intel_svm_remove_dev_pasid(dev, pasid);
>> +		goto out_tear_down;
>> +	}
> 
> But why don't you need to do all the other
> intel_pasid_tear_down_entry(), intel_svm_drain_prq() (which is
> misnamed) and other stuff from intel_svm_remove_dev_pasid() ?

Perhaps,

	if (domain->type == IOMMU_DOMAIN_SVA) {
		intel_svm_remove_dev_pasid(dev, pasid);
		return;
	}

?

> 
> There still seems to be waaay too much "SVM" in the PASID code.

This segment of code is destined to be temporary. From a long-term
perspective, I hope to move SVA specific staffs such as mm notification,
prq draining, etc. to the iommu core. They are generic rather than Intel
iommu specific.

After the code consolidation done, the code here will become simpler and
appealing. We just need to tear down the pasid entry.

> 
>> +static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
>> +				     struct device *dev, ioasid_t pasid)
>> +{
>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +	struct intel_iommu *iommu = info->iommu;
>> +	struct dev_pasid_info *dev_pasid;
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (context_copied(iommu, info->bus, info->devfn))
>> +		return -EBUSY;
>> +
>> +	ret = prepare_domain_attach_device(domain, dev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
>> +	if (!dev_pasid)
>> +		return -ENOMEM;
>> +
>> +	ret = domain_attach_iommu(dmar_domain, iommu);
>> +	if (ret)
>> +		goto out_free;
>> +
>> +	if (domain_type_is_si(dmar_domain))
>> +		ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
>> +						     dev, pasid);
>> +	else if (dmar_domain->use_first_level)
>> +		ret = domain_setup_first_level(iommu, dmar_domain,
>> +					       dev, pasid);
>> +	else
>> +		ret = intel_pasid_setup_second_level(iommu, dmar_domain,
>> +						     dev, pasid);
> 
> It would be nice if the different domain types had their own ops..

Good suggestion!

We can add a domain ops in the Intel domain structure which is
responsible for how to install an Intel iommu domain onto the VT-d
hardware.

It worth a separated refactoring series. Let me do it afterward.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain
  2023-05-30  2:19     ` Baolu Lu
@ 2023-05-30 16:55       ` Jason Gunthorpe
  2023-05-31  4:02         ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-30 16:55 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jacob Pan, LKML, iommu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On Tue, May 30, 2023 at 10:19:05AM +0800, Baolu Lu wrote:
> On 5/30/23 3:48 AM, Jason Gunthorpe wrote:
> > On Fri, May 19, 2023 at 01:32:22PM -0700, Jacob Pan wrote:
> > 
> > > @@ -4720,25 +4762,99 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
> > >   static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> > >   {
> > >   	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > > +	struct dev_pasid_info *curr, *dev_pasid = NULL;
> > > +	struct dmar_domain *dmar_domain;
> > >   	struct iommu_domain *domain;
> > > +	unsigned long flags;
> > > -	/* Domain type specific cleanup: */
> > >   	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> > > -	if (domain) {
> > > -		switch (domain->type) {
> > > -		case IOMMU_DOMAIN_SVA:
> > > -			intel_svm_remove_dev_pasid(dev, pasid);
> > > -			break;
> > > -		default:
> > > -			/* should never reach here */
> > > -			WARN_ON(1);
> > > +	if (!domain)
> > > +		goto out_tear_down;
> > > +
> > > +	/*
> > > +	 * The SVA implementation needs to stop mm notification, drain the
> > > +	 * pending page fault requests before tearing down the pasid entry.
> > > +	 * The VT-d spec (section 6.2.3.1) also recommends that software
> > > +	 * could use a reserved domain id for all first-only and pass-through
> > > +	 * translations. Hence there's no need to call domain_detach_iommu()
> > > +	 * in the sva domain case.
> > > +	 */
> > > +	if (domain->type == IOMMU_DOMAIN_SVA) {
> > > +		intel_svm_remove_dev_pasid(dev, pasid);
> > > +		goto out_tear_down;
> > > +	}
> > 
> > But why don't you need to do all the other
> > intel_pasid_tear_down_entry(), intel_svm_drain_prq() (which is
> > misnamed) and other stuff from intel_svm_remove_dev_pasid() ?
> 
> Perhaps,
> 
> 	if (domain->type == IOMMU_DOMAIN_SVA) {
> 		intel_svm_remove_dev_pasid(dev, pasid);
> 		return;
> 	}
> 
> ?

I would expect only stuff directly connected to SVM be in the SVM
function.

De-initalizing PRI and any other pasid destruction should be in this
function.

> > There still seems to be waaay too much "SVM" in the PASID code.
> 
> This segment of code is destined to be temporary. From a long-term
> perspective, I hope to move SVA specific staffs such as mm notification,
> prq draining, etc. to the iommu core. They are generic rather than Intel
> iommu specific.

Yes, sort of, but.. That is just the mmu notifier bits

All the PRI/PASID teardown needs to be unlinked from SVM

> > It would be nice if the different domain types had their own ops..
> 
> Good suggestion!
> 
> We can add a domain ops in the Intel domain structure which is
> responsible for how to install an Intel iommu domain onto the VT-d
> hardware.

We should have seperate iommu_domain_ops at least, I think that would
cover alot of it?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain
  2023-05-30 16:55       ` Jason Gunthorpe
@ 2023-05-31  4:02         ` Baolu Lu
  0 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-05-31  4:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Jacob Pan, LKML, iommu, Joerg Roedel, dmaengine, vkoul,
	Robin Murphy, Will Deacon, David Woodhouse, Raj Ashok, Tian,
	Kevin, Yi Liu, Yu, Fenghua, Dave Jiang, Tony Luck, Zanussi, Tom,
	narayan.ranganathan

On 5/31/23 12:55 AM, Jason Gunthorpe wrote:
> On Tue, May 30, 2023 at 10:19:05AM +0800, Baolu Lu wrote:
>> On 5/30/23 3:48 AM, Jason Gunthorpe wrote:
>>> On Fri, May 19, 2023 at 01:32:22PM -0700, Jacob Pan wrote:
>>>
>>>> @@ -4720,25 +4762,99 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
>>>>    static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>>>>    {
>>>>    	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>>>> +	struct dev_pasid_info *curr, *dev_pasid = NULL;
>>>> +	struct dmar_domain *dmar_domain;
>>>>    	struct iommu_domain *domain;
>>>> +	unsigned long flags;
>>>> -	/* Domain type specific cleanup: */
>>>>    	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
>>>> -	if (domain) {
>>>> -		switch (domain->type) {
>>>> -		case IOMMU_DOMAIN_SVA:
>>>> -			intel_svm_remove_dev_pasid(dev, pasid);
>>>> -			break;
>>>> -		default:
>>>> -			/* should never reach here */
>>>> -			WARN_ON(1);
>>>> +	if (!domain)
>>>> +		goto out_tear_down;
>>>> +
>>>> +	/*
>>>> +	 * The SVA implementation needs to stop mm notification, drain the
>>>> +	 * pending page fault requests before tearing down the pasid entry.
>>>> +	 * The VT-d spec (section 6.2.3.1) also recommends that software
>>>> +	 * could use a reserved domain id for all first-only and pass-through
>>>> +	 * translations. Hence there's no need to call domain_detach_iommu()
>>>> +	 * in the sva domain case.
>>>> +	 */
>>>> +	if (domain->type == IOMMU_DOMAIN_SVA) {
>>>> +		intel_svm_remove_dev_pasid(dev, pasid);
>>>> +		goto out_tear_down;
>>>> +	}
>>>
>>> But why don't you need to do all the other
>>> intel_pasid_tear_down_entry(), intel_svm_drain_prq() (which is
>>> misnamed) and other stuff from intel_svm_remove_dev_pasid() ?
>>
>> Perhaps,
>>
>> 	if (domain->type == IOMMU_DOMAIN_SVA) {
>> 		intel_svm_remove_dev_pasid(dev, pasid);
>> 		return;
>> 	}
>>
>> ?
> 
> I would expect only stuff directly connected to SVM be in the SVM
> function.
> 
> De-initalizing PRI and any other pasid destruction should be in this
> function.
> 
>>> There still seems to be waaay too much "SVM" in the PASID code.
>>
>> This segment of code is destined to be temporary. From a long-term
>> perspective, I hope to move SVA specific staffs such as mm notification,
>> prq draining, etc. to the iommu core. They are generic rather than Intel
>> iommu specific.
> 
> Yes, sort of, but.. That is just the mmu notifier bits
> 
> All the PRI/PASID teardown needs to be unlinked from SVM

Get your point now. Yes. PRI and PASID teardown are not SVA-specific.
Sorry that we should rename SVM to SVA to unify the Linux terminology.

> 
>>> It would be nice if the different domain types had their own ops..
>>
>> Good suggestion!
>>
>> We can add a domain ops in the Intel domain structure which is
>> responsible for how to install an Intel iommu domain onto the VT-d
>> hardware.
> 
> We should have seperate iommu_domain_ops at least, I think that would
> cover alot of it?

Are you suggesting adding this ops in common iommu_domain or intel's
dmar_domain? My understanding is the latter. To do so, probably we need
to define various callbacks for different type of domains: identity,
blocking, dma remapping, sva and possibly nested. Also need to care
about legacy vs. scalable mode.

That's the reason why I hoped to do all these in separated series with
carefully reviewing and testing.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-05-31  4:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19 20:32 [PATCH v6 0/4] Re-enable IDXD kernel workqueue under DMA API Jacob Pan
2023-05-19 20:32 ` [PATCH v6 1/4] iommu: Generalize default PCIe requester ID PASID Jacob Pan
2023-05-21  6:21   ` Baolu Lu
2023-05-23 14:47   ` Jean-Philippe Brucker
2023-05-23 15:26     ` Jacob Pan
2023-05-29 19:50   ` Jason Gunthorpe
2023-05-19 20:32 ` [PATCH v6 2/4] iommu: Move global PASID allocation from SVA to core Jacob Pan
2023-05-21  6:21   ` Baolu Lu
2023-05-22 17:32     ` Jacob Pan
2023-05-29 19:43   ` Jason Gunthorpe
2023-05-19 20:32 ` [PATCH v6 3/4] iommu/vt-d: Add set_dev_pasid callback for dma domain Jacob Pan
2023-05-29 19:48   ` Jason Gunthorpe
2023-05-30  2:19     ` Baolu Lu
2023-05-30 16:55       ` Jason Gunthorpe
2023-05-31  4:02         ` Baolu Lu
2023-05-19 20:32 ` [PATCH v6 4/4] dmaengine/idxd: Re-enable kernel workqueue under DMA API Jacob Pan
2023-05-21  6:29   ` Baolu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).