linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device
@ 2018-07-22  6:09 Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu

Hi,

The Mediate Device is a framework for fine-grained physical device
sharing across the isolated domains. Currently the mdev framework
is designed to be independent of the platform IOMMU support. As the
result, the DMA isolation relies on the mdev parent device in a
vendor specific way.

There are several cases where a mediated device could be protected
and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
[1] introduces a new translation mode called 'scalable mode', which
enables PASID-granular translations. The vt-d scalable mode is the
key ingredient for Scalable I/O Virtualization [2] [3] which allows
sharing a device in minimal possible granularity (ADI - Assignable
Device Interface).

A mediated device backed by an ADI could be protected and isolated
by the IOMMU since 1) the parent device supports tagging an unique
PASID to all DMA traffic out of the mediated device; and 2) the DMA
translation unit (IOMMU) supports the PASID granular translation.
We can apply IOMMU protection and isolation to this kind of devices
just as what we are doing with an assignable PCI device.

In order to distinguish the IOMMU-capable mediated devices from those
which still need to rely on parent devices, this patch set adds a
domain type attribute to each mdev.

enum mdev_domain_type {
       DOMAIN_TYPE_EXTERNAL,   /* Use the external domain and all
                                * IOMMU staff controlled by the
                                * parent device driver.
                                */
       DOMAIN_TYPE_INHERITANCE,/* Use the same domain as the parent device. */
       DOMAIN_TYPE_PRIVATE,    /* Capable of having a private domain. For an
                                * example, the parent device is able to bind
                                * a specific PASID for a mediated device and
                                * transfering data with the asigned PASID.
                                */
};

The mdev parent device driver could opt-in whether an mdev is IOMMU
capable when the device is created by invoking below interface within
its @create callback:

int mdev_set_domain_type(struct device *dev,
                         enum mdev_domain_type type);

In vfio_iommu_type1_attach_group(), a domain will be found and assigned
to the group according to the domain type attributes of mdev devices in
the group.

Besides above, we still need below changes to sport IOMMU-capable mdev
device:
a) The platform specific iommu ops should be set to mdev bus. So that,
   the vfio/mdev framework could call iommu APIs.
b) The iommu driver should be changed to support setting up the translation
   structures for a mediated device.

This patch series extends both IOMMU and vfio components to support
mdev device passing through when it could be isolated and protected
by the IOMMU units. The first part of this series (PATCH 1/10 ~ 5/10)
makes the Intel IOMMU driver to be aware of a mediated device. The
second part (PATCH 6/10 ~ 8/10) sets the iommu ops for the mdev bus.
The last part (PATCH 9/10 ~ 10/10) adds the domain type attribute to
a mdev device and allocates an iommu domain if it is IOMMU-capable.

This patch series depends on a patch set posted here [4] for discussion
which added the support for scalable mode in Intel IOMMU driver.

References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[4] https://lkml.org/lkml/2018/7/16/62

Best regards,
Lu Baolu

Lu Baolu (10):
  iommu/vt-d: Get iommu device for a mediated device
  iommu/vt-d: Alloc domain for a mediated device
  iommu/vt-d: Allocate groups for mediated devices
  iommu/vt-d: Get pasid table for a mediated device
  iommu/vt-d: Setup DMA remapping for mediated devices
  iommu: Add iommu_set_bus API interface
  iommu/vt-d: Add set_bus iommu ops
  vfio/mdev: Set iommu ops for mdev bus
  vfio/mdev: Add mediated device domain type
  vfio/type1: Allocate domain for mediated device

 drivers/iommu/intel-iommu.c      | 127 ++++++++++++++++++++++++++++++++++++---
 drivers/iommu/intel-pasid.c      |  16 ++++-
 drivers/iommu/intel-pasid.h      |  16 +++++
 drivers/iommu/iommu.c            |  23 +++++++
 drivers/vfio/mdev/mdev_core.c    |  30 ++++++++-
 drivers/vfio/mdev/mdev_driver.c  |  10 +++
 drivers/vfio/mdev/mdev_private.h |   1 +
 drivers/vfio/vfio_iommu_type1.c  |  43 +++++++++----
 include/linux/intel-iommu.h      |   5 ++
 include/linux/iommu.h            |  12 ++++
 include/linux/mdev.h             |  22 +++++++
 11 files changed, 282 insertions(+), 23 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-23  4:44   ` Liu, Yi L
  2018-07-24 18:50   ` Alex Williamson
  2018-07-22  6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This patch adds the support to get the iommu device for a mediated
device. The assumption is that each mediated device is a minimal
assignable set of a physical PCI device. Hence, we should use the
iommu device of the parent PCI device to manage the translation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c |  6 ++++++
 drivers/iommu/intel-pasid.h | 16 ++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7d198ea..fc3ac1c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf
 	if (iommu_dummy(dev))
 		return NULL;
 
+	if (dev_is_mdev(dev)) {
+		dev = dev_mdev_parent(dev);
+		if (WARN_ON(!dev_is_pci(dev)))
+			return NULL;
+	}
+
 	if (dev_is_pci(dev)) {
 		struct pci_dev *pf_pdev;
 
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 518df72..46cde66 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -35,6 +35,22 @@ struct pasid_table {
 	struct list_head	dev;		/* device list */
 };
 
+static inline bool dev_is_mdev(struct device *dev)
+{
+	if (!dev)
+		return false;
+
+	return !strcmp(dev->bus->name, "mdev");
+}
+
+static inline struct device *dev_mdev_parent(struct device *dev)
+{
+	if (WARN_ON(!dev_is_mdev(dev)))
+		return NULL;
+
+	return dev->parent;
+}
+
 extern u32 intel_pasid_max_id;
 int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
 void intel_pasid_free_id(int pasid);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-23  4:44   ` Liu, Yi L
  2018-07-22  6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

The PASID-granular 2nd level address translation makes it possible
to isolate and protect a mediated device exposed by a real device
which support PCI PASID features. This patch adds support to allocate
a domain for a mediated device. A default pasid value will be
allocated as well for the mediated device. This will be used by the
mediated device attached to this domain for non-SVM DMA transactions.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 17 ++++++++++++++++-
 include/linux/intel-iommu.h |  5 +++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fc3ac1c..3ede34a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain)
 	domain_remove_dev_info(domain);
 	rcu_read_unlock();
 
+	if (domain->default_pasid > 0)
+		intel_pasid_free_id(domain->default_pasid);
+
 	/* destroy iovas */
 	put_iova_domain(&domain->iovad);
 
@@ -2519,11 +2522,23 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 		}
 	}
 
+	if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) {
+		int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev));
+
+		domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
+							     max, GFP_KERNEL);
+		if (domain->default_pasid < 0) {
+			free_devinfo_mem(info);
+			return NULL;
+		}
+	}
+
 	spin_lock_irqsave(&device_domain_lock, flags);
 	if (dev)
 		found = find_domain(dev);
 
-	if (!found) {
+	/* A mediated device never has an DMA alias. Ignore searching. */
+	if (!found && !dev_is_mdev(dev)) {
 		struct device_domain_info *info2;
 		info2 = dmar_search_domain_by_dev_info(iommu->segment, bus, devfn);
 		if (info2) {
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7efc632..9f5dcf6 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -423,6 +423,11 @@ struct dmar_domain {
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
 	u64		max_addr;	/* maximum mapped address */
 
+	int		default_pasid;	/*
+					 * The default pasid used for non-SVM
+					 * traffic on mediated devices.
+					 */
+
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-23  4:44   ` Liu, Yi L
  2018-07-22  6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

With the Intel IOMMU supporting PASID granularity isolation and
protection, a mediated device could be isolated and protected by
an IOMMU unit. We need to allocate a new group instead of a PCI
group.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3ede34a..57ccfc4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device *dev,
 	}
 }
 
+static struct iommu_group *intel_iommu_device_group(struct device *dev)
+{
+	if (dev_is_pci(dev))
+		return pci_device_group(dev);
+
+	if (dev_is_mdev(dev))
+		return iommu_group_alloc();
+
+	return ERR_PTR(-EINVAL);
+}
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
 {
@@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.remove_device		= intel_iommu_remove_device,
 	.get_resv_regions	= intel_iommu_get_resv_regions,
 	.put_resv_regions	= intel_iommu_put_resv_regions,
-	.device_group		= pci_device_group,
+	.device_group		= intel_iommu_device_group,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (2 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This patch adds the support to get the pasid table for a mediated
device. The assumption is that each mediated device is a minimal
assignable set of a physical PCI device. Hence, we should use the
pasid table of the parent PCI device to manage the translation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 1195c2a..9b4d462 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -129,7 +129,19 @@ int intel_pasid_alloc_table(struct device *dev)
 	int ret, order;
 
 	info = dev->archdata.iommu;
-	if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
+	if (WARN_ON(!info || info->pasid_table))
+		return -EINVAL;
+
+	/* Use parent PCI device pasid table for mdev: */
+	if (dev_is_mdev(dev)) {
+		pasid_table = intel_pasid_get_table(dev_mdev_parent(dev));
+		if (pasid_table)
+			goto attach_out;
+		else
+			return -ENOMEM;
+	}
+
+	if (WARN_ON(!dev_is_pci(dev)))
 		return -EINVAL;
 
 	/* DMA alias device already has a pasid table, use it: */
@@ -190,7 +202,7 @@ void intel_pasid_free_table(struct device *dev)
 	int i, max_pde;
 
 	info = dev->archdata.iommu;
-	if (!info || !dev_is_pci(dev) || !info->pasid_table)
+	if (!info || !info->pasid_table)
 		return;
 
 	pasid_table = info->pasid_table;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (3 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-23  4:44   ` Liu, Yi L
  2018-07-22  6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This configures the second level page table when external components
request to allocate a domain for a mediated device.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 73 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 66 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 57ccfc4..b6e9ea8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2569,8 +2569,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 	if (dev)
 		dev->archdata.iommu = info;
 
-	if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
-		bool pass_through;
+	if (dev && sm_supported(iommu)) {
+		bool pass_through = hw_pass_through &&
+				domain_type_is_si(domain);
 
 		ret = intel_pasid_alloc_table(dev);
 		if (ret) {
@@ -2579,12 +2580,21 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 			return NULL;
 		}
 
-		/* Setup the PASID entry for requests without PASID: */
-		pass_through = hw_pass_through && domain_type_is_si(domain);
 		spin_lock(&iommu->lock);
-		intel_pasid_setup_second_level(iommu, domain, dev,
-					       PASID_RID2PASID,
-					       pass_through);
+
+		/* Setup the PASID entry for requests without PASID: */
+		if (dev_is_pci(dev))
+			intel_pasid_setup_second_level(iommu, domain, dev,
+						       PASID_RID2PASID,
+						       pass_through);
+		/* Setup the PASID entry for mediated devices: */
+		else if (dev_is_mdev(dev))
+			intel_pasid_setup_second_level(iommu, domain, dev,
+						       domain->default_pasid,
+						       false);
+		else
+			pr_err("Unsupported device %s\n", dev_name(dev));
+
 		spin_unlock(&iommu->lock);
 	}
 	spin_unlock_irqrestore(&device_domain_lock, flags);
@@ -4937,6 +4947,32 @@ static void domain_context_clear(struct intel_iommu *iommu, struct device *dev)
 	pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb, iommu);
 }
 
+static void
+iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 gran)
+{
+	struct qi_desc desc;
+
+	desc.high = 0;
+	desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+			QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE;
+
+	qi_submit_sync(&desc, iommu);
+}
+
+static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu,
+					struct device *dev, int sid, int pasid)
+{
+	struct qi_desc desc;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int qdep = pci_ats_queue_depth(pdev);
+
+	desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+			QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE;
+	desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE;
+
+	qi_submit_sync(&desc, iommu);
+}
+
 static void __dmar_remove_one_dev_info(struct device_domain_info *info)
 {
 	struct intel_iommu *iommu;
@@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info)
 
 	iommu = info->iommu;
 
+	if (dev_is_mdev(info->dev)) {
+		struct dmar_domain *domain = info->domain;
+		int did = domain->iommu_did[iommu->seq_id];
+		int sid = info->bus << 8 | info->devfn;
+		struct device *dev = info->dev;
+
+		intel_pasid_clear_entry(dev, domain->default_pasid);
+
+		/* Flush IOTLB including PASID Cache: */
+		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+
+		/*
+		 * Flush EIOTLB. The only way to flush global mappings within
+		 * a PASID is to use QI_GRAN_ALL_ALL.
+		 */
+		iommu_flush_ext_iotlb(iommu, did, domain->default_pasid,
+				      QI_GRAN_ALL_ALL);
+
+		/* Flush Dev TLB: */
+		iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid,
+					    domain->default_pasid);
+	}
+
 	if (info->dev) {
 		iommu_disable_dev_iotlb(info);
 		domain_context_clear(iommu, info->dev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (4 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This adds iommu_set_bus API by adding a new set_bus ops in the
iommu_ops structure. A vendor IOMMU driver could either specify
its callback or safely ignore it. This interface could be used
to set the iommu methods used for a particular non-pci bus. One
consumer of this interface could be vfio/mdev bus when the mdev
devices could be exclusively protected by the IOMMU units.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 23 +++++++++++++++++++++++
 include/linux/iommu.h | 12 ++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b3756..b22b0a7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1976,3 +1976,26 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+int iommu_set_bus(struct bus_type *bus)
+{
+	struct iommu_device *iommu;
+	int ret = -ENODEV;
+
+	spin_lock(&iommu_device_lock);
+	/*
+	 * Iterate over iommu list and try setting bus with
+	 * each iommu until a successful setting.
+	 */
+	list_for_each_entry(iommu, &iommu_device_list, list) {
+		if (iommu->ops->set_bus) {
+			ret = iommu->ops->set_bus(bus, iommu);
+			if (!ret)
+				break;
+		}
+	}
+	spin_unlock(&iommu_device_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_set_bus);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee..2679796 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -48,6 +48,7 @@ struct bus_type;
 struct device;
 struct iommu_domain;
 struct notifier_block;
+struct iommu_device;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
@@ -187,6 +188,7 @@ struct iommu_resv_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @set_bus: set iommu ops for a non-pci bus
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -235,6 +237,9 @@ struct iommu_ops {
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
 
+	/* Set iommu ops for a bus */
+	int (*set_bus)(struct bus_type *bus, struct iommu_device *iommu);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -412,6 +417,8 @@ void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
 
+int iommu_set_bus(struct bus_type *bus);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -696,6 +703,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_set_bus(struct bus_type *bus)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (5 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This specifies the Intel IOMMU specific set_bus operation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b6e9ea8..de36292 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5346,6 +5346,23 @@ static struct iommu_group *intel_iommu_device_group(struct device *dev)
 	return ERR_PTR(-EINVAL);
 }
 
+static int intel_iommu_set_bus(struct bus_type *bus, struct iommu_device *idev)
+{
+	struct intel_iommu *iommu;
+
+	iommu = container_of(idev, struct intel_iommu, iommu);
+	if (strcmp(bus->name, "mdev"))
+		return -EINVAL;
+
+	if (!sm_supported(iommu))
+		return -ENODEV;
+
+	bus_set_iommu(bus, &intel_iommu_ops);
+	bus_register_notifier(bus, &device_nb);
+
+	return 0;
+}
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
 {
@@ -5440,6 +5457,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.remove_device		= intel_iommu_remove_device,
 	.get_resv_regions	= intel_iommu_get_resv_regions,
 	.put_resv_regions	= intel_iommu_put_resv_regions,
+	.set_bus		= intel_iommu_set_bus,
 	.device_group		= intel_iommu_device_group,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (6 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This sets the iommu ops for the mdev bus with iommu_set_bus().
With the iommu ops setting, a mediated device might be isolated
and protected by an IOMMU unit.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c   |  9 ++++++++-
 drivers/vfio/mdev/mdev_driver.c | 10 ++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0e..d8f19ba 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -16,6 +16,7 @@
 #include <linux/uuid.h>
 #include <linux/sysfs.h>
 #include <linux/mdev.h>
+#include <linux/iommu.h>
 
 #include "mdev_private.h"
 
@@ -392,7 +393,13 @@ int mdev_device_remove(struct device *dev, bool force_remove)
 
 static int __init mdev_init(void)
 {
-	return mdev_bus_register();
+	int ret;
+
+	ret = mdev_bus_register();
+	if (!ret)
+		iommu_set_bus(&mdev_bus_type);
+
+	return ret;
 }
 
 static void __exit mdev_exit(void)
diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
index 6f0391f..76ac9e6 100644
--- a/drivers/vfio/mdev/mdev_driver.c
+++ b/drivers/vfio/mdev/mdev_driver.c
@@ -21,6 +21,13 @@ static int mdev_attach_iommu(struct mdev_device *mdev)
 	int ret;
 	struct iommu_group *group;
 
+	/*
+	 * If iommu_ops is set for bus, add_device() will allocate
+	 * a group and add the device in the group.
+	 */
+	if (iommu_present(mdev->dev.bus))
+		return 0;
+
 	group = iommu_group_alloc();
 	if (IS_ERR(group))
 		return PTR_ERR(group);
@@ -36,6 +43,9 @@ static int mdev_attach_iommu(struct mdev_device *mdev)
 
 static void mdev_detach_iommu(struct mdev_device *mdev)
 {
+	if (iommu_present(mdev->dev.bus))
+		return;
+
 	iommu_group_remove_device(&mdev->dev);
 	dev_info(&mdev->dev, "MDEV: detaching iommu\n");
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (7 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  2018-07-22  6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

A parent device might create different types of mediated
devices. For example, a mediated device could be created
on the parent device with a PASID tagged. When the iommu
supports PASID-granular translations, the mediated device
is individually protected and isolated by the iommu. It's
hence possible to allocate a domain for each such device.

This patch defines the domain types of a mediated device
and allows the parent driver to specify this attribute
after a mdev is actually created.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c    | 21 +++++++++++++++++++++
 drivers/vfio/mdev/mdev_private.h |  1 +
 include/linux/mdev.h             | 22 ++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index d8f19ba..4d82b36 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -391,6 +391,27 @@ int mdev_device_remove(struct device *dev, bool force_remove)
 	return 0;
 }
 
+int mdev_set_domain_type(struct device *dev, enum mdev_domain_type type)
+{
+	struct mdev_device *mdev = to_mdev_device(dev);
+
+	if (type == DOMAIN_TYPE_PRIVATE && !iommu_present(&mdev_bus_type))
+		return -EINVAL;
+
+	mdev->domain_type = type;
+
+	return 0;
+}
+EXPORT_SYMBOL(mdev_set_domain_type);
+
+enum mdev_domain_type mdev_get_domain_type(struct device *dev)
+{
+	struct mdev_device *mdev = to_mdev_device(dev);
+
+	return mdev->domain_type;
+}
+EXPORT_SYMBOL(mdev_get_domain_type);
+
 static int __init mdev_init(void)
 {
 	int ret;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index b5819b7..d47a670 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -34,6 +34,7 @@ struct mdev_device {
 	struct list_head next;
 	struct kobject *type_kobj;
 	bool active;
+	int domain_type;
 };
 
 #define to_mdev_device(dev)	container_of(dev, struct mdev_device, dev)
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index b6e048e..5d862b0 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -15,6 +15,28 @@
 
 struct mdev_device;
 
+enum mdev_domain_type {
+	DOMAIN_TYPE_EXTERNAL,	/* Use the external domain and all
+				 * IOMMU staff controlled by the
+				 * parent device driver.
+				 */
+	DOMAIN_TYPE_INHERITANCE,/* Use the same domain as the parent device. */
+	DOMAIN_TYPE_PRIVATE,	/* Capable of having a private domain. For an
+				 * example, the parent device is able to bind
+				 * a specific PASID for a mediated device and
+				 * transferring data with the asigned PASID.
+				 */
+};
+
+/*
+ * Called by the parent device driver to set the domain type.
+ * By default, the domain type is set to DOMAIN_TYPE_EXTERNAL.
+ */
+int mdev_set_domain_type(struct device *dev, enum mdev_domain_type type);
+
+/* Check the domain type. */
+enum mdev_domain_type mdev_get_domain_type(struct device *dev);
+
 /**
  * struct mdev_parent_ops - Structure to be registered for each parent device to
  * register the device to mdev module.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device
  2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
                   ` (8 preceding siblings ...)
  2018-07-22  6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu
@ 2018-07-22  6:09 ` Lu Baolu
  9 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-22  6:09 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan

This allocates a domain for the mediated device if it is able to
be isolated and protected individually by IOMMU.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 43 ++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3e5b177..496bea6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1177,6 +1177,22 @@ static int vfio_bus_type(struct device *dev, void *data)
 	return 0;
 }
 
+static int mdev_private_domain(struct device *dev, void *data)
+{
+	enum mdev_domain_type type;
+	enum mdev_domain_type (*fn)(struct device *dev);
+
+	fn = symbol_get(mdev_get_domain_type);
+	if (fn) {
+		type = fn(dev);
+		symbol_put(mdev_get_domain_type);
+
+		return type != DOMAIN_TYPE_PRIVATE;
+	}
+
+	return -EINVAL;
+}
+
 static int vfio_iommu_replay(struct vfio_iommu *iommu,
 			     struct vfio_domain *domain)
 {
@@ -1371,18 +1387,23 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	mdev_bus = symbol_get(mdev_bus_type);
 
 	if (mdev_bus) {
-		if ((bus == mdev_bus) && !iommu_present(bus)) {
-			symbol_put(mdev_bus_type);
-			if (!iommu->external_domain) {
-				INIT_LIST_HEAD(&domain->group_list);
-				iommu->external_domain = domain;
-			} else
-				kfree(domain);
+		if (bus == mdev_bus) {
+			ret = iommu_group_for_each_dev(iommu_group, NULL,
+						       mdev_private_domain);
+			if (!iommu_present(bus) || ret) {
+				symbol_put(mdev_bus_type);
+				if (!iommu->external_domain) {
+					INIT_LIST_HEAD(&domain->group_list);
+					iommu->external_domain = domain;
+				} else {
+					kfree(domain);
+				}
 
-			list_add(&group->next,
-				 &iommu->external_domain->group_list);
-			mutex_unlock(&iommu->lock);
-			return 0;
+				list_add(&group->next,
+					 &iommu->external_domain->group_list);
+				mutex_unlock(&iommu->lock);
+				return 0;
+			}
 		}
 		symbol_put(mdev_bus_type);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device
  2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
@ 2018-07-23  4:44   ` Liu, Yi L
  2018-07-24  2:06     ` Lu Baolu
  2018-07-24 18:50   ` Alex Williamson
  1 sibling, 1 reply; 32+ messages in thread
From: Liu, Yi L @ 2018-07-23  4:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi Baolu,

Per my understanding, the subject is not accurate. I think this patch is
to get parent device structure for a mediate device instead of iommu
device. This may be misleading for reviewers.

Thanks,
Yi Liu

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Sunday, July 22, 2018 2:09 PM
> 
> This patch adds the support to get the iommu device for a mediated device. The
> assumption is that each mediated device is a minimal assignable set of a physical PCI
> device. Hence, we should use the iommu device of the parent PCI device to manage
> the translation.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c |  6 ++++++  drivers/iommu/intel-pasid.h | 16
> ++++++++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 7d198ea..fc3ac1c 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device
> *dev, u8 *bus, u8 *devf
>  	if (iommu_dummy(dev))
>  		return NULL;
> 
> +	if (dev_is_mdev(dev)) {
> +		dev = dev_mdev_parent(dev);
> +		if (WARN_ON(!dev_is_pci(dev)))
> +			return NULL;
> +	}
> +
>  	if (dev_is_pci(dev)) {
>  		struct pci_dev *pf_pdev;
> 
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index
> 518df72..46cde66 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -35,6 +35,22 @@ struct pasid_table {
>  	struct list_head	dev;		/* device list */
>  };
> 
> +static inline bool dev_is_mdev(struct device *dev) {
> +	if (!dev)
> +		return false;
> +
> +	return !strcmp(dev->bus->name, "mdev"); }
> +
> +static inline struct device *dev_mdev_parent(struct device *dev) {
> +	if (WARN_ON(!dev_is_mdev(dev)))
> +		return NULL;
> +
> +	return dev->parent;
> +}
> +
>  extern u32 intel_pasid_max_id;
>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);  void
> intel_pasid_free_id(int pasid);
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device
  2018-07-22  6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu
@ 2018-07-23  4:44   ` Liu, Yi L
  2018-07-24  2:09     ` Lu Baolu
  0 siblings, 1 reply; 32+ messages in thread
From: Liu, Yi L @ 2018-07-23  4:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Sunday, July 22, 2018 2:09 PM
> 
> The PASID-granular 2nd level address translation makes it possible to isolate and
> protect a mediated device exposed by a real device which support PCI PASID

"support" -> "supports"

> features. This patch adds support to allocate a domain for a mediated device. A
> default pasid value will be allocated as well for the mediated device. This will be used

This is not accurate, it is "a default pasid value will be allocated for the domain, and the
mdev will be configed to use this pasid when it is added to this domain"

Regards,
Yi Liu

> by the mediated device attached to this domain for non-SVM DMA transactions.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 17 ++++++++++++++++-  include/linux/intel-iommu.h
> |  5 +++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> fc3ac1c..3ede34a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain)
>  	domain_remove_dev_info(domain);
>  	rcu_read_unlock();
> 
> +	if (domain->default_pasid > 0)
> +		intel_pasid_free_id(domain->default_pasid);
> +
>  	/* destroy iovas */
>  	put_iova_domain(&domain->iovad);
> 
> @@ -2519,11 +2522,23 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>  		}
>  	}
> 
> +	if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) {
> +		int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev));
> +
> +		domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> +							     max, GFP_KERNEL);
> +		if (domain->default_pasid < 0) {
> +			free_devinfo_mem(info);
> +			return NULL;
> +		}
> +	}
> +
>  	spin_lock_irqsave(&device_domain_lock, flags);
>  	if (dev)
>  		found = find_domain(dev);
> 
> -	if (!found) {
> +	/* A mediated device never has an DMA alias. Ignore searching. */
> +	if (!found && !dev_is_mdev(dev)) {
>  		struct device_domain_info *info2;
>  		info2 = dmar_search_domain_by_dev_info(iommu->segment, bus,
> devfn);
>  		if (info2) {
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> 7efc632..9f5dcf6 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -423,6 +423,11 @@ struct dmar_domain {
>  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>  	u64		max_addr;	/* maximum mapped address */
> 
> +	int		default_pasid;	/*
> +					 * The default pasid used for non-SVM
> +					 * traffic on mediated devices.
> +					 */
> +
>  	struct iommu_domain domain;	/* generic domain data structure for
>  					   iommu core */
>  };
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-22  6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu
@ 2018-07-23  4:44   ` Liu, Yi L
  2018-07-24  2:22     ` Lu Baolu
  0 siblings, 1 reply; 32+ messages in thread
From: Liu, Yi L @ 2018-07-23  4:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Sunday, July 22, 2018 2:09 PM
> 
> With the Intel IOMMU supporting PASID granularity isolation and protection, a
> mediated device could be isolated and protected by an IOMMU unit. We need to
> allocate a new group instead of a PCI group.

Existing vfio mdev framework also allocates an iommu group for mediate device.

mdev_probe()
  |_ mdev_attach_iommu()
       |_ iommu_group_alloc()

IMHO, this patch actually do a wrapper to the group allocation API. The reason is: it
is not necessary to apply the group allocation rules when the allocation is for mediated
device. Instead, just allocate a new one for it. :)

Thanks,
Yi Liu
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 3ede34a..57ccfc4 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device
> *dev,
>  	}
>  }
> 
> +static struct iommu_group *intel_iommu_device_group(struct device *dev)
> +{
> +	if (dev_is_pci(dev))
> +		return pci_device_group(dev);
> +
> +	if (dev_is_mdev(dev))
> +		return iommu_group_alloc();
> +
> +	return ERR_PTR(-EINVAL);
> +}
> +
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev
> *sdev)  { @@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = {
>  	.remove_device		= intel_iommu_remove_device,
>  	.get_resv_regions	= intel_iommu_get_resv_regions,
>  	.put_resv_regions	= intel_iommu_put_resv_regions,
> -	.device_group		= pci_device_group,
> +	.device_group		= intel_iommu_device_group,
>  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
>  };
> 
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices
  2018-07-22  6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu
@ 2018-07-23  4:44   ` Liu, Yi L
  2018-07-24  2:29     ` Lu Baolu
  0 siblings, 1 reply; 32+ messages in thread
From: Liu, Yi L @ 2018-07-23  4:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Sunday, July 22, 2018 2:09 PM
> 
> This configures the second level page table when external components request to
> allocate a domain for a mediated device.

I think it should be the time when adding a mediate device to the domain.

> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 73
> ++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 66 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 57ccfc4..b6e9ea8 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2569,8 +2569,9 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>  	if (dev)
>  		dev->archdata.iommu = info;
> 
> -	if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
> -		bool pass_through;
> +	if (dev && sm_supported(iommu)) {
> +		bool pass_through = hw_pass_through &&
> +				domain_type_is_si(domain);
> 
>  		ret = intel_pasid_alloc_table(dev);
>  		if (ret) {
> @@ -2579,12 +2580,21 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>  			return NULL;
>  		}
> 
> -		/* Setup the PASID entry for requests without PASID: */
> -		pass_through = hw_pass_through && domain_type_is_si(domain);
>  		spin_lock(&iommu->lock);
> -		intel_pasid_setup_second_level(iommu, domain, dev,
> -					       PASID_RID2PASID,
> -					       pass_through);
> +
> +		/* Setup the PASID entry for requests without PASID: */
> +		if (dev_is_pci(dev))
> +			intel_pasid_setup_second_level(iommu, domain, dev,
> +						       PASID_RID2PASID,

can domain->default_pasid be initialized to be PASID_RID2PASID if the domain
is not a domain used by mediated device?

> +						       pass_through);
> +		/* Setup the PASID entry for mediated devices: */
> +		else if (dev_is_mdev(dev))
> +			intel_pasid_setup_second_level(iommu, domain, dev,
> +						       domain->default_pasid,
> +						       false);
> +		else
> +			pr_err("Unsupported device %s\n", dev_name(dev));
> +
>  		spin_unlock(&iommu->lock);
>  	}
>  	spin_unlock_irqrestore(&device_domain_lock, flags); @@ -4937,6 +4947,32
> @@ static void domain_context_clear(struct intel_iommu *iommu, struct device
> *dev)
>  	pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb,
> iommu);  }
> 
> +static void
> +iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid,

Per latest VT-d spec, its new name is pasid-based-iotlb. Pls update the naming
accordingly. :)

Regards,
Yi Liu

> +u64 gran) {
> +	struct qi_desc desc;
> +
> +	desc.high = 0;
> +	desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> +			QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE;
> +
> +	qi_submit_sync(&desc, iommu);
> +}
> +
> +static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu,
> +					struct device *dev, int sid, int pasid) {
> +	struct qi_desc desc;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int qdep = pci_ats_queue_depth(pdev);
> +
> +	desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
> +			QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE;
> +	desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE;
> +
> +	qi_submit_sync(&desc, iommu);
> +}
> +
>  static void __dmar_remove_one_dev_info(struct device_domain_info *info)  {
>  	struct intel_iommu *iommu;
> @@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
> 
>  	iommu = info->iommu;
> 
> +	if (dev_is_mdev(info->dev)) {
> +		struct dmar_domain *domain = info->domain;
> +		int did = domain->iommu_did[iommu->seq_id];
> +		int sid = info->bus << 8 | info->devfn;
> +		struct device *dev = info->dev;
> +
> +		intel_pasid_clear_entry(dev, domain->default_pasid);
> +
> +		/* Flush IOTLB including PASID Cache: */
> +		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> +
> +		/*
> +		 * Flush EIOTLB. The only way to flush global mappings within
> +		 * a PASID is to use QI_GRAN_ALL_ALL.
> +		 */
> +		iommu_flush_ext_iotlb(iommu, did, domain->default_pasid,
> +				      QI_GRAN_ALL_ALL);
> +
> +		/* Flush Dev TLB: */
> +		iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid,
> +					    domain->default_pasid);
> +	}
> +
>  	if (info->dev) {
>  		iommu_disable_dev_iotlb(info);
>  		domain_context_clear(iommu, info->dev);
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device
  2018-07-23  4:44   ` Liu, Yi L
@ 2018-07-24  2:06     ` Lu Baolu
  0 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-24  2:06 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson,
	Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi Yi,

Thank you for reviewing my patch.

On 07/23/2018 12:44 PM, Liu, Yi L wrote:
> Hi Baolu,
>
> Per my understanding, the subject is not accurate. I think this patch is
> to get parent device structure for a mediate device instead of iommu
> device. This may be misleading for reviewers.

Please check below.

>
> Thanks,
> Yi Liu
>
>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>> Sent: Sunday, July 22, 2018 2:09 PM
>>
>> This patch adds the support to get the iommu device for a mediated device. The
>> assumption is that each mediated device is a minimal assignable set of a physical PCI
>> device. Hence, we should use the iommu device of the parent PCI device to manage
>> the translation.
>>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>  drivers/iommu/intel-iommu.c |  6 ++++++  drivers/iommu/intel-pasid.h | 16
>> ++++++++++++++++
>>  2 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> 7d198ea..fc3ac1c 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device
>> *dev, u8 *bus, u8 *devf
>>  	if (iommu_dummy(dev))
>>  		return NULL;
>>
>> +	if (dev_is_mdev(dev)) {
>> +		dev = dev_mdev_parent(dev);
>> +		if (WARN_ON(!dev_is_pci(dev)))
>> +			return NULL;
>> +	}
>> +

This gets the parent of a mediated device. And an iommu device will be
found which services the parent device. A mediated device should in the
same iommu scope as its parent.

Best regards,
Lu Baolu

>>  	if (dev_is_pci(dev)) {
>>  		struct pci_dev *pf_pdev;
>>
>> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index
>> 518df72..46cde66 100644
>> --- a/drivers/iommu/intel-pasid.h
>> +++ b/drivers/iommu/intel-pasid.h
>> @@ -35,6 +35,22 @@ struct pasid_table {
>>  	struct list_head	dev;		/* device list */
>>  };
>>
>> +static inline bool dev_is_mdev(struct device *dev) {
>> +	if (!dev)
>> +		return false;
>> +
>> +	return !strcmp(dev->bus->name, "mdev"); }
>> +
>> +static inline struct device *dev_mdev_parent(struct device *dev) {
>> +	if (WARN_ON(!dev_is_mdev(dev)))
>> +		return NULL;
>> +
>> +	return dev->parent;
>> +}
>> +
>>  extern u32 intel_pasid_max_id;
>>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);  void
>> intel_pasid_free_id(int pasid);
>> --
>> 2.7.4
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device
  2018-07-23  4:44   ` Liu, Yi L
@ 2018-07-24  2:09     ` Lu Baolu
  0 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-24  2:09 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson,
	Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi,

On 07/23/2018 12:44 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>> Sent: Sunday, July 22, 2018 2:09 PM
>>
>> The PASID-granular 2nd level address translation makes it possible to isolate and
>> protect a mediated device exposed by a real device which support PCI PASID
> "support" -> "supports"

Sure.

>
>> features. This patch adds support to allocate a domain for a mediated device. A
>> default pasid value will be allocated as well for the mediated device. This will be used
> This is not accurate, it is "a default pasid value will be allocated for the domain, and the
> mdev will be configed to use this pasid when it is added to this domain"

Looks better. Thank you.

Best regards,
Lu Baolu

>
> Regards,
> Yi Liu
>
>> by the mediated device attached to this domain for non-SVM DMA transactions.
>>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>  drivers/iommu/intel-iommu.c | 17 ++++++++++++++++-  include/linux/intel-iommu.h
>> |  5 +++++
>>  2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> fc3ac1c..3ede34a 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain)
>>  	domain_remove_dev_info(domain);
>>  	rcu_read_unlock();
>>
>> +	if (domain->default_pasid > 0)
>> +		intel_pasid_free_id(domain->default_pasid);
>> +
>>  	/* destroy iovas */
>>  	put_iova_domain(&domain->iovad);
>>
>> @@ -2519,11 +2522,23 @@ static struct dmar_domain
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>  		}
>>  	}
>>
>> +	if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) {
>> +		int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev));
>> +
>> +		domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
>> +							     max, GFP_KERNEL);
>> +		if (domain->default_pasid < 0) {
>> +			free_devinfo_mem(info);
>> +			return NULL;
>> +		}
>> +	}
>> +
>>  	spin_lock_irqsave(&device_domain_lock, flags);
>>  	if (dev)
>>  		found = find_domain(dev);
>>
>> -	if (!found) {
>> +	/* A mediated device never has an DMA alias. Ignore searching. */
>> +	if (!found && !dev_is_mdev(dev)) {
>>  		struct device_domain_info *info2;
>>  		info2 = dmar_search_domain_by_dev_info(iommu->segment, bus,
>> devfn);
>>  		if (info2) {
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
>> 7efc632..9f5dcf6 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -423,6 +423,11 @@ struct dmar_domain {
>>  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>>  	u64		max_addr;	/* maximum mapped address */
>>
>> +	int		default_pasid;	/*
>> +					 * The default pasid used for non-SVM
>> +					 * traffic on mediated devices.
>> +					 */
>> +
>>  	struct iommu_domain domain;	/* generic domain data structure for
>>  					   iommu core */
>>  };
>> --
>> 2.7.4
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-23  4:44   ` Liu, Yi L
@ 2018-07-24  2:22     ` Lu Baolu
  2018-07-24 11:30       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 32+ messages in thread
From: Lu Baolu @ 2018-07-24  2:22 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson,
	Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi,

On 07/23/2018 12:44 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>> Sent: Sunday, July 22, 2018 2:09 PM
>>
>> With the Intel IOMMU supporting PASID granularity isolation and protection, a
>> mediated device could be isolated and protected by an IOMMU unit. We need to
>> allocate a new group instead of a PCI group.
> Existing vfio mdev framework also allocates an iommu group for mediate device.
>
> mdev_probe()
>   |_ mdev_attach_iommu()
>        |_ iommu_group_alloc()

When external components ask iommu to allocate a group for a device,
it will call pci_device_group in Intel IOMMU driver's @device_group
callback. In another word, current Intel IOMMU driver doesn't aware
the mediated device and treat all devices as PCI ones. This patch
extends the @device_group call back to make it aware of a mediated
device.

Best regards,
Lu Baolu

>
> IMHO, this patch actually do a wrapper to the group allocation API. The reason is: it
> is not necessary to apply the group allocation rules when the allocation is for mediated
> device. Instead, just allocate a new one for it. :)
>
> Thanks,
> Yi Liu
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>  drivers/iommu/intel-iommu.c | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> 3ede34a..57ccfc4 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device
>> *dev,
>>  	}
>>  }
>>
>> +static struct iommu_group *intel_iommu_device_group(struct device *dev)
>> +{
>> +	if (dev_is_pci(dev))
>> +		return pci_device_group(dev);
>> +
>> +	if (dev_is_mdev(dev))
>> +		return iommu_group_alloc();
>> +
>> +	return ERR_PTR(-EINVAL);
>> +}
>> +
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>>  int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev
>> *sdev)  { @@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = {
>>  	.remove_device		= intel_iommu_remove_device,
>>  	.get_resv_regions	= intel_iommu_get_resv_regions,
>>  	.put_resv_regions	= intel_iommu_put_resv_regions,
>> -	.device_group		= pci_device_group,
>> +	.device_group		= intel_iommu_device_group,
>>  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
>>  };
>>
>> --
>> 2.7.4
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices
  2018-07-23  4:44   ` Liu, Yi L
@ 2018-07-24  2:29     ` Lu Baolu
  0 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-24  2:29 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson,
	Kirti Wankhede
  Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun,
	Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi,

On 07/23/2018 12:44 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>> Sent: Sunday, July 22, 2018 2:09 PM
>>
>> This configures the second level page table when external components request to
>> allocate a domain for a mediated device.
> I think it should be the time when adding a mediate device to the domain.

You are right. Will correct it.

>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>  drivers/iommu/intel-iommu.c | 73
>> ++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 66 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> 57ccfc4..b6e9ea8 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2569,8 +2569,9 @@ static struct dmar_domain
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>  	if (dev)
>>  		dev->archdata.iommu = info;
>>
>> -	if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
>> -		bool pass_through;
>> +	if (dev && sm_supported(iommu)) {
>> +		bool pass_through = hw_pass_through &&
>> +				domain_type_is_si(domain);
>>
>>  		ret = intel_pasid_alloc_table(dev);
>>  		if (ret) {
>> @@ -2579,12 +2580,21 @@ static struct dmar_domain
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>  			return NULL;
>>  		}
>>
>> -		/* Setup the PASID entry for requests without PASID: */
>> -		pass_through = hw_pass_through && domain_type_is_si(domain);
>>  		spin_lock(&iommu->lock);
>> -		intel_pasid_setup_second_level(iommu, domain, dev,
>> -					       PASID_RID2PASID,
>> -					       pass_through);
>> +
>> +		/* Setup the PASID entry for requests without PASID: */
>> +		if (dev_is_pci(dev))
>> +			intel_pasid_setup_second_level(iommu, domain, dev,
>> +						       PASID_RID2PASID,
> can domain->default_pasid be initialized to be PASID_RID2PASID if the domain
> is not a domain used by mediated device?

PASID_RID2PASID is 0 as well.


>
>> +						       pass_through);
>> +		/* Setup the PASID entry for mediated devices: */
>> +		else if (dev_is_mdev(dev))
>> +			intel_pasid_setup_second_level(iommu, domain, dev,
>> +						       domain->default_pasid,
>> +						       false);
>> +		else
>> +			pr_err("Unsupported device %s\n", dev_name(dev));
>> +
>>  		spin_unlock(&iommu->lock);
>>  	}
>>  	spin_unlock_irqrestore(&device_domain_lock, flags); @@ -4937,6 +4947,32
>> @@ static void domain_context_clear(struct intel_iommu *iommu, struct device
>> *dev)
>>  	pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb,
>> iommu);  }
>>
>> +static void
>> +iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid,
> Per latest VT-d spec, its new name is pasid-based-iotlb. Pls update the naming
> accordingly. :)

Okay.

Best regards,
Lu Baolu

>
> Regards,
> Yi Liu
>
>> +u64 gran) {
>> +	struct qi_desc desc;
>> +
>> +	desc.high = 0;
>> +	desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
>> +			QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE;
>> +
>> +	qi_submit_sync(&desc, iommu);
>> +}
>> +
>> +static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu,
>> +					struct device *dev, int sid, int pasid) {
>> +	struct qi_desc desc;
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +	int qdep = pci_ats_queue_depth(pdev);
>> +
>> +	desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
>> +			QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE;
>> +	desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE;
>> +
>> +	qi_submit_sync(&desc, iommu);
>> +}
>> +
>>  static void __dmar_remove_one_dev_info(struct device_domain_info *info)  {
>>  	struct intel_iommu *iommu;
>> @@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct
>> device_domain_info *info)
>>
>>  	iommu = info->iommu;
>>
>> +	if (dev_is_mdev(info->dev)) {
>> +		struct dmar_domain *domain = info->domain;
>> +		int did = domain->iommu_did[iommu->seq_id];
>> +		int sid = info->bus << 8 | info->devfn;
>> +		struct device *dev = info->dev;
>> +
>> +		intel_pasid_clear_entry(dev, domain->default_pasid);
>> +
>> +		/* Flush IOTLB including PASID Cache: */
>> +		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
>> +
>> +		/*
>> +		 * Flush EIOTLB. The only way to flush global mappings within
>> +		 * a PASID is to use QI_GRAN_ALL_ALL.
>> +		 */
>> +		iommu_flush_ext_iotlb(iommu, did, domain->default_pasid,
>> +				      QI_GRAN_ALL_ALL);
>> +
>> +		/* Flush Dev TLB: */
>> +		iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid,
>> +					    domain->default_pasid);
>> +	}
>> +
>>  	if (info->dev) {
>>  		iommu_disable_dev_iotlb(info);
>>  		domain_context_clear(iommu, info->dev);
>> --
>> 2.7.4
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-24  2:22     ` Lu Baolu
@ 2018-07-24 11:30       ` Jean-Philippe Brucker
  2018-07-24 19:51         ` Alex Williamson
                           ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Jean-Philippe Brucker @ 2018-07-24 11:30 UTC (permalink / raw)
  To: Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

Hi Baolu,

On 24/07/18 03:22, Lu Baolu wrote:
> Hi,
> 
> On 07/23/2018 12:44 PM, Liu, Yi L wrote:
>>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>>> Sent: Sunday, July 22, 2018 2:09 PM
>>>
>>> With the Intel IOMMU supporting PASID granularity isolation and protection, a
>>> mediated device could be isolated and protected by an IOMMU unit. We need to
>>> allocate a new group instead of a PCI group.
>> Existing vfio mdev framework also allocates an iommu group for mediate device.
>>
>> mdev_probe()
>>   |_ mdev_attach_iommu()
>>        |_ iommu_group_alloc()
> 
> When external components ask iommu to allocate a group for a device,
> it will call pci_device_group in Intel IOMMU driver's @device_group
> callback. In another word, current Intel IOMMU driver doesn't aware
> the mediated device and treat all devices as PCI ones. This patch
> extends the @device_group call back to make it aware of a mediated
> device.

I agree that allocating two groups for an mdev seems strange, and in my
opinion we shouldn't export the notion of mdev to IOMMU drivers.
Otherwise each driver will have to add its own "dev_is_mdev()" special
cases, which will get messy in the long run. Besides, the macro is
currently private, and to be exported it should be wrapped in
symbol_get/put(mdev_bus_type).

There is another way: as we're planning to add a generic pasid_alloc()
function to the IOMMU API, the mdev module itself could allocate a
default PASID for each mdev by calling pasid_alloc() on the mdev's
parent, and then do map()/unmap() with that PASID. This way we don't
have to add IOMMU ops to the mdev bus, everything can still be done
using the ops of the parent. And IOMMU drivers "only" have to implement
PASID ops, which will be reused by drivers other than mdev.

The allocated PASID also needs to be installed into the parent device.
If the mdev module knows the PASID, we can do that by adding
set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to
mdev_parent_ops.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device
  2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
  2018-07-23  4:44   ` Liu, Yi L
@ 2018-07-24 18:50   ` Alex Williamson
  2018-07-25  1:18     ` Lu Baolu
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2018-07-24 18:50 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, David Woodhouse, Kirti Wankhede, ashok.raj,
	sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun,
	peterx, iommu, kvm, linux-kernel, Jacob Pan

On Sun, 22 Jul 2018 14:09:24 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> This patch adds the support to get the iommu device for a mediated
> device. The assumption is that each mediated device is a minimal
> assignable set of a physical PCI device. Hence, we should use the
> iommu device of the parent PCI device to manage the translation.

Hmm, is this absolutely a valid assumption?  I'm afraid there's an
assumption throughout this series that the only way an mdev device
could be interacting with the IOMMU is via S-IOV, but we could choose
today to make an mdev wrapper for any device, which might provide basic
RID granularity to the IOMMU.  So if I make an mdev wrapper for a PF
such that I can implement migration for that device, is it valid for
the IOMMU driver to flag me as an mdev device and base mappings on my
parent device?  Perhaps in this patch we can assume that the parent of
such an mdev device would be the PF backing it and that results in the
correct drhd, but in the next patch we start imposing the assumption
that because the device is an mdev, the only valid association is via
pasid, which I'd say is incorrect.
 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c |  6 ++++++
>  drivers/iommu/intel-pasid.h | 16 ++++++++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7d198ea..fc3ac1c 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf
>  	if (iommu_dummy(dev))
>  		return NULL;
>  
> +	if (dev_is_mdev(dev)) {
> +		dev = dev_mdev_parent(dev);
> +		if (WARN_ON(!dev_is_pci(dev)))
> +			return NULL;
> +	}
> +
>  	if (dev_is_pci(dev)) {
>  		struct pci_dev *pf_pdev;
>  
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 518df72..46cde66 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -35,6 +35,22 @@ struct pasid_table {
>  	struct list_head	dev;		/* device list */
>  };
>  
> +static inline bool dev_is_mdev(struct device *dev)
> +{
> +	if (!dev)
> +		return false;
> +
> +	return !strcmp(dev->bus->name, "mdev");
> +}

I assume we're not using mdev_bus_type because mdev is a loadable
module and that symbol doesn't exist in this statically loaded driver,
but strcmp is a pretty ugly alternative.  Could we use symbol_get() so
that we can use mdev_bus_type?  Thanks,

Alex

> +
> +static inline struct device *dev_mdev_parent(struct device *dev)
> +{
> +	if (WARN_ON(!dev_is_mdev(dev)))
> +		return NULL;
> +
> +	return dev->parent;
> +}
> +
>  extern u32 intel_pasid_max_id;
>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
>  void intel_pasid_free_id(int pasid);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-24 11:30       ` Jean-Philippe Brucker
@ 2018-07-24 19:51         ` Alex Williamson
  2018-07-25  2:06         ` Lu Baolu
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Alex Williamson @ 2018-07-24 19:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Kirti Wankhede, Raj, Ashok, kvm, Kumar, Sanjay K, iommu,
	linux-kernel, Sun, Yi Y, Pan, Jacob jun

On Tue, 24 Jul 2018 12:30:35 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Hi Baolu,
> 
> On 24/07/18 03:22, Lu Baolu wrote:
> > Hi,
> > 
> > On 07/23/2018 12:44 PM, Liu, Yi L wrote:  
> >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> >>> Sent: Sunday, July 22, 2018 2:09 PM
> >>>
> >>> With the Intel IOMMU supporting PASID granularity isolation and protection, a
> >>> mediated device could be isolated and protected by an IOMMU unit. We need to
> >>> allocate a new group instead of a PCI group.  
> >> Existing vfio mdev framework also allocates an iommu group for mediate device.
> >>
> >> mdev_probe()
> >>   |_ mdev_attach_iommu()
> >>        |_ iommu_group_alloc()  
> > 
> > When external components ask iommu to allocate a group for a device,
> > it will call pci_device_group in Intel IOMMU driver's @device_group
> > callback. In another word, current Intel IOMMU driver doesn't aware
> > the mediated device and treat all devices as PCI ones. This patch
> > extends the @device_group call back to make it aware of a mediated
> > device.  
> 
> I agree that allocating two groups for an mdev seems strange, and in my
> opinion we shouldn't export the notion of mdev to IOMMU drivers.

Yep, I was just thinking the same thing.  This is too highly integrated
into VT-d and too narrowly focused on PASID being the only way that an
mdev could make use of the IOMMU.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device
  2018-07-24 18:50   ` Alex Williamson
@ 2018-07-25  1:18     ` Lu Baolu
  0 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-25  1:18 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, David Woodhouse, Kirti Wankhede, ashok.raj,
	sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun,
	peterx, iommu, kvm, linux-kernel, Jacob Pan

Hi Alex,

On 07/25/2018 02:50 AM, Alex Williamson wrote:
> On Sun, 22 Jul 2018 14:09:24 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>
>> This patch adds the support to get the iommu device for a mediated
>> device. The assumption is that each mediated device is a minimal
>> assignable set of a physical PCI device. Hence, we should use the
>> iommu device of the parent PCI device to manage the translation.
> Hmm, is this absolutely a valid assumption?  I'm afraid there's an
> assumption throughout this series that the only way an mdev device
> could be interacting with the IOMMU is via S-IOV, but we could choose
> today to make an mdev wrapper for any device, which might provide basic
> RID granularity to the IOMMU.  So if I make an mdev wrapper for a PF
> such that I can implement migration for that device, is it valid for
> the IOMMU driver to flag me as an mdev device and base mappings on my
> parent device? 

You are right. We should not make it SIOV centric. Let me look into the
patches and identify/fix the unreasonable assumptions.

>  Perhaps in this patch we can assume that the parent of
> such an mdev device would be the PF backing it and that results in the
> correct drhd,

Okay.

>  but in the next patch we start imposing the assumption
> that because the device is an mdev, the only valid association is via
> pasid, which I'd say is incorrect.

You are right. I will find and fix it.

>  
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>  drivers/iommu/intel-iommu.c |  6 ++++++
>>  drivers/iommu/intel-pasid.h | 16 ++++++++++++++++
>>  2 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 7d198ea..fc3ac1c 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf
>>  	if (iommu_dummy(dev))
>>  		return NULL;
>>  
>> +	if (dev_is_mdev(dev)) {
>> +		dev = dev_mdev_parent(dev);
>> +		if (WARN_ON(!dev_is_pci(dev)))
>> +			return NULL;
>> +	}
>> +
>>  	if (dev_is_pci(dev)) {
>>  		struct pci_dev *pf_pdev;
>>  
>> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
>> index 518df72..46cde66 100644
>> --- a/drivers/iommu/intel-pasid.h
>> +++ b/drivers/iommu/intel-pasid.h
>> @@ -35,6 +35,22 @@ struct pasid_table {
>>  	struct list_head	dev;		/* device list */
>>  };
>>  
>> +static inline bool dev_is_mdev(struct device *dev)
>> +{
>> +	if (!dev)
>> +		return false;
>> +
>> +	return !strcmp(dev->bus->name, "mdev");
>> +}
> I assume we're not using mdev_bus_type because mdev is a loadable
> module and that symbol doesn't exist in this statically loaded driver,

Yes.

> but strcmp is a pretty ugly alternative.  Could we use symbol_get() so
> that we can use mdev_bus_type?  Thanks,

Sure.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-24 11:30       ` Jean-Philippe Brucker
  2018-07-24 19:51         ` Alex Williamson
@ 2018-07-25  2:06         ` Lu Baolu
  2018-07-25  2:16         ` Tian, Kevin
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Lu Baolu @ 2018-07-25  2:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun, Tian, Kevin

Hi Jean,

Pull Kevin in. Not sure why he was removed from cc list.

On 07/24/2018 07:30 PM, Jean-Philippe Brucker wrote:
> Hi Baolu,
>
> On 24/07/18 03:22, Lu Baolu wrote:
>> Hi,
>>
>> On 07/23/2018 12:44 PM, Liu, Yi L wrote:
>>>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>>>> Sent: Sunday, July 22, 2018 2:09 PM
>>>>
>>>> With the Intel IOMMU supporting PASID granularity isolation and protection, a
>>>> mediated device could be isolated and protected by an IOMMU unit. We need to
>>>> allocate a new group instead of a PCI group.
>>> Existing vfio mdev framework also allocates an iommu group for mediate device.
>>>
>>> mdev_probe()
>>>   |_ mdev_attach_iommu()
>>>        |_ iommu_group_alloc()
>> When external components ask iommu to allocate a group for a device,
>> it will call pci_device_group in Intel IOMMU driver's @device_group
>> callback. In another word, current Intel IOMMU driver doesn't aware
>> the mediated device and treat all devices as PCI ones. This patch
>> extends the @device_group call back to make it aware of a mediated
>> device.
> I agree that allocating two groups for an mdev seems strange, and in my
> opinion we shouldn't export the notion of mdev to IOMMU drivers.
> Otherwise each driver will have to add its own "dev_is_mdev()" special
> cases, which will get messy in the long run. Besides, the macro is
> currently private, and to be exported it should be wrapped in
> symbol_get/put(mdev_bus_type).

I agree with you.

It should be better if we can make it in a driver agnostic way.

>
> There is another way: as we're planning to add a generic pasid_alloc()
> function to the IOMMU API, the mdev module itself could allocate a
> default PASID for each mdev by calling pasid_alloc() on the mdev's
> parent, and then do map()/unmap() with that PASID. This way we don't
> have to add IOMMU ops to the mdev bus, everything can still be done
> using the ops of the parent. And IOMMU drivers "only" have to implement
> PASID ops, which will be reused by drivers other than mdev.

A quick question when I thinking about this is how to allocate a domain
for the mediated device who uses the default pasid for DMA transactions?

Best regards,
Lu Baolu

>
> The allocated PASID also needs to be installed into the parent device.
> If the mdev module knows the PASID, we can do that by adding
> set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to
> mdev_parent_ops.
>
> Thanks,
> Jean
>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-24 11:30       ` Jean-Philippe Brucker
  2018-07-24 19:51         ` Alex Williamson
  2018-07-25  2:06         ` Lu Baolu
@ 2018-07-25  2:16         ` Tian, Kevin
  2018-07-25  2:35         ` Liu, Yi L
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com>
  4 siblings, 0 replies; 32+ messages in thread
From: Tian, Kevin @ 2018-07-25  2:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Lu Baolu, Liu, Yi L, Joerg Roedel,
	David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

> From: Jean-Philippe Brucker
> Sent: Tuesday, July 24, 2018 7:31 PM
> 
> Hi Baolu,
> 
> On 24/07/18 03:22, Lu Baolu wrote:
> > Hi,
> >
> > On 07/23/2018 12:44 PM, Liu, Yi L wrote:
> >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> >>> Sent: Sunday, July 22, 2018 2:09 PM
> >>>
> >>> With the Intel IOMMU supporting PASID granularity isolation and
> protection, a
> >>> mediated device could be isolated and protected by an IOMMU unit.
> We need to
> >>> allocate a new group instead of a PCI group.
> >> Existing vfio mdev framework also allocates an iommu group for
> mediate device.
> >>
> >> mdev_probe()
> >>   |_ mdev_attach_iommu()
> >>        |_ iommu_group_alloc()
> >
> > When external components ask iommu to allocate a group for a device,
> > it will call pci_device_group in Intel IOMMU driver's @device_group
> > callback. In another word, current Intel IOMMU driver doesn't aware
> > the mediated device and treat all devices as PCI ones. This patch
> > extends the @device_group call back to make it aware of a mediated
> > device.
> 
> I agree that allocating two groups for an mdev seems strange, and in my
> opinion we shouldn't export the notion of mdev to IOMMU drivers.
> Otherwise each driver will have to add its own "dev_is_mdev()" special
> cases, which will get messy in the long run. Besides, the macro is
> currently private, and to be exported it should be wrapped in
> symbol_get/put(mdev_bus_type).

There is only one group per mdev, but I agree that avoiding mdev
awareness in IOMMU driver is definitely good... We didn't find a
good way before, which is why current RFC was implemented that 
way.

> 
> There is another way: as we're planning to add a generic pasid_alloc()
> function to the IOMMU API, the mdev module itself could allocate a
> default PASID for each mdev by calling pasid_alloc() on the mdev's
> parent, and then do map()/unmap() with that PASID. This way we don't
> have to add IOMMU ops to the mdev bus, everything can still be done
> using the ops of the parent. And IOMMU drivers "only" have to implement
> PASID ops, which will be reused by drivers other than mdev.

yes, PASID is more general abstraction than mdev. However there is
one problem. Please note with new scalable mode now PASID is not
used just for SVA. You can do 1st-level, 2nd-level and nested in PASID
granularity, meaning each PASID can be bound to an unique iommu
domain. However today IOMMU driver allows only one iommu_domain 
per device. If we simply install allocated PASID to parent device, we 
should also remove that iommu_domain limitation. Is it the way that
we want to pursue?

mdev avoids such problem as it introduces a separate device...

> 
> The allocated PASID also needs to be installed into the parent device.
> If the mdev module knows the PASID, we can do that by adding
> set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to
> mdev_parent_ops.
> 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-24 11:30       ` Jean-Philippe Brucker
                           ` (2 preceding siblings ...)
  2018-07-25  2:16         ` Tian, Kevin
@ 2018-07-25  2:35         ` Liu, Yi L
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com>
  4 siblings, 0 replies; 32+ messages in thread
From: Liu, Yi L @ 2018-07-25  2:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Lu Baolu, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun, Tian, Kevin

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Tuesday, July 24, 2018 7:31 PM
> 
> Hi Baolu,
> 
> On 24/07/18 03:22, Lu Baolu wrote:
> > Hi,
> >
> > On 07/23/2018 12:44 PM, Liu, Yi L wrote:
> >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> >>> Sent: Sunday, July 22, 2018 2:09 PM
> >>>
> >>> With the Intel IOMMU supporting PASID granularity isolation and protection, a
> >>> mediated device could be isolated and protected by an IOMMU unit. We need to
> >>> allocate a new group instead of a PCI group.
> >> Existing vfio mdev framework also allocates an iommu group for mediate device.
> >>
> >> mdev_probe()
> >>   |_ mdev_attach_iommu()
> >>        |_ iommu_group_alloc()
> >
> > When external components ask iommu to allocate a group for a device,
> > it will call pci_device_group in Intel IOMMU driver's @device_group
> > callback. In another word, current Intel IOMMU driver doesn't aware
> > the mediated device and treat all devices as PCI ones. This patch
> > extends the @device_group call back to make it aware of a mediated
> > device.
> 
> I agree that allocating two groups for an mdev seems strange, and in my

There will not be two groups for a mdev. Pls refer to Patch 08/10 of this
series. Baolu added iommu_ops check when doing group allocation in
mdev_attach_iommu().

[RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus

@@ -21,6 +21,13 @@ static int mdev_attach_iommu(struct mdev_device *mdev)
 	int ret;
 	struct iommu_group *group;
 
+	/*
+	 * If iommu_ops is set for bus, add_device() will allocate
+	 * a group and add the device in the group.
+	 */
+	if (iommu_present(mdev->dev.bus))
+		return 0;
+

> opinion we shouldn't export the notion of mdev to IOMMU drivers.

The key idea of this RFC is to tag iommu domain with PASID, if any mdev
is added to such a domain, it would get the PASID and config in its parent.
Thus the transactions from mdev can be isolated in iommu hardware.

Based on this idea, mdevs can be managed in a flexible manner. e.g.
if two mdevs are assigned to same VM, they can share PASID. This share
can be easily achieve by adding them to the same domain. If we default
allocate a PASID for each mdev, it may be a waste.

With vendor-specific iommu driver handle the mdev difference, it can
largely keep the fundamental iommu concepts in current software
implementation.

> Otherwise each driver will have to add its own "dev_is_mdev()" special
> cases, which will get messy in the long run. Besides, the macro is
> currently private, and to be exported it should be wrapped in
> symbol_get/put(mdev_bus_type).

Agreed. Should figure out a better manner.

> There is another way: as we're planning to add a generic pasid_alloc()
> function to the IOMMU API, the mdev module itself could allocate a
> default PASID for each mdev by calling pasid_alloc() on the mdev's
> parent, and then do map()/unmap() with that PASID. This way we don't

so far, map/unmap is per-domain operation. In this way, passing PASID makes
it be kind of per-device operation. This may affect too much of existing software
implementation.

> have to add IOMMU ops to the mdev bus, everything can still be done
> using the ops of the parent. And IOMMU drivers "only" have to implement
> PASID ops, which will be reused by drivers other than mdev.
> 
> The allocated PASID also needs to be installed into the parent device.
> If the mdev module knows the PASID, we can do that by adding
> set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to
> mdev_parent_ops.

Your idea is fascinating. Pls feel free let us know if we missed any from you. :)

> Thanks,
> Jean

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com>
@ 2018-07-25  6:19           ` Tian, Kevin
  2018-07-25 19:19             ` Jean-Philippe Brucker
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2018-07-25  6:19 UTC (permalink / raw)
  To: 'Jean-Philippe Brucker',
	Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

> From: Tian, Kevin
> Sent: Wednesday, July 25, 2018 10:16 AM
> 
[...]
> >
> > There is another way: as we're planning to add a generic pasid_alloc()
> > function to the IOMMU API, the mdev module itself could allocate a
> > default PASID for each mdev by calling pasid_alloc() on the mdev's
> > parent, and then do map()/unmap() with that PASID. This way we don't
> > have to add IOMMU ops to the mdev bus, everything can still be done
> > using the ops of the parent. And IOMMU drivers "only" have to
> implement
> > PASID ops, which will be reused by drivers other than mdev.
> 
> yes, PASID is more general abstraction than mdev. However there is
> one problem. Please note with new scalable mode now PASID is not
> used just for SVA. You can do 1st-level, 2nd-level and nested in PASID
> granularity, meaning each PASID can be bound to an unique iommu
> domain. However today IOMMU driver allows only one iommu_domain
> per device. If we simply install allocated PASID to parent device, we
> should also remove that iommu_domain limitation. Is it the way that
> we want to pursue?
> 
> mdev avoids such problem as it introduces a separate device...

another thing to think about is to simplify the caller logic. It would
be good to have consistent IOMMU APIs cross PCI device and 
mdev, for common VFIO operations e.g. map/unmap, iommu_
attach_group, etc. If we need special version ops with PASID, it 
brings burden to VFIO and other IOMMU clients...

For IOMMU-capable mdev, there are two use cases which have 
been talked so far:

1) mdev with PASID-granular DMA isolation
2) mdev inheriting RID from parent device (no sharing)

1) is what this RFC is currently for. 2) is what Alex concerned 
which is not covered in RFC yet.

In my mind, an ideal design is like below:

a) VFIO provides an interface for parent driver to specify 
whether a mdev is IOMMU capable, with flag to indicate 
whether it's PASID-granular or RID-inherit;

b) Once an IOMMU-capable mdev is created, VFIO treats it no
different from normal physical devices, using exactly same 
IOMMU APIs to operate (including future SVA). VFIO doesn't 
care whether PASID or RID will be used in hardware;

c) IOMMU driver then handle RID/PASID difference internally, 
based on its own configuration and mdev type.

To support above goal, I feel a new device handle is a nice fit 
to represent mdev within IOMMU driver, with same set of 
capabilities as observed on its parent device.

Jean, maybe I didn't fully understand your proposal. Can you
elaborate whether above goal can be achieved with it?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-25  6:19           ` Tian, Kevin
@ 2018-07-25 19:19             ` Jean-Philippe Brucker
  2018-07-26  3:03               ` Tian, Kevin
       [not found]               ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com>
  0 siblings, 2 replies; 32+ messages in thread
From: Jean-Philippe Brucker @ 2018-07-25 19:19 UTC (permalink / raw)
  To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

On 25/07/18 07:19, Tian, Kevin wrote:
>> From: Tian, Kevin
>> Sent: Wednesday, July 25, 2018 10:16 AM
>>
> [...]
>>>
>>> There is another way: as we're planning to add a generic pasid_alloc()
>>> function to the IOMMU API, the mdev module itself could allocate a
>>> default PASID for each mdev by calling pasid_alloc() on the mdev's
>>> parent, and then do map()/unmap() with that PASID. This way we don't
>>> have to add IOMMU ops to the mdev bus, everything can still be done
>>> using the ops of the parent. And IOMMU drivers "only" have to
>> implement
>>> PASID ops, which will be reused by drivers other than mdev.
>>
>> yes, PASID is more general abstraction than mdev. However there is
>> one problem. Please note with new scalable mode now PASID is not
>> used just for SVA. You can do 1st-level, 2nd-level and nested in PASID
>> granularity, meaning each PASID can be bound to an unique iommu
>> domain. However today IOMMU driver allows only one iommu_domain
>> per device. If we simply install allocated PASID to parent device, we
>> should also remove that iommu_domain limitation. Is it the way that
>> we want to pursue?
>>
>> mdev avoids such problem as it introduces a separate device...
> 
> another thing to think about is to simplify the caller logic. It would
> be good to have consistent IOMMU APIs cross PCI device and 
> mdev, for common VFIO operations e.g. map/unmap, iommu_
> attach_group, etc. If we need special version ops with PASID, it 
> brings burden to VFIO and other IOMMU clients...

Yes, providing special ops is the simplest to implement (or least
invasive, I guess) but isn't particularly elegant. See the patch from
Jordan below, that I was planning to add to the SVA series (I'm
laboriously working towards next version) and my email from last week:
https://patchwork.kernel.org/patch/10412251/

Whenever I come back to hierarchical IOMMU domains I reject it as too
complicated, but maybe that is what we need. I find it difficult to
reason about because domains currently represent both a collection of
devices and a one or more address spaces. I proposed the io_mm thing to
represent a single address space, and to avoid adding special cases to
every function in the IOMMU subsystem that manipulates domains.

> For IOMMU-capable mdev, there are two use cases which have 
> been talked so far:
> 
> 1) mdev with PASID-granular DMA isolation
> 2) mdev inheriting RID from parent device (no sharing)

Would that be a single mdev per parent device? Otherwise I don't really
understand how it would work without all map/unmap operations going to
the parent device's driver.

> 1) is what this RFC is currently for. 2) is what Alex concerned 
> which is not covered in RFC yet.
> 
> In my mind, an ideal design is like below:
> 
> a) VFIO provides an interface for parent driver to specify 
> whether a mdev is IOMMU capable, with flag to indicate 
> whether it's PASID-granular or RID-inherit;
> 
> b) Once an IOMMU-capable mdev is created, VFIO treats it no
> different from normal physical devices, using exactly same 
> IOMMU APIs to operate (including future SVA). VFIO doesn't 
> care whether PASID or RID will be used in hardware;
> 
> c) IOMMU driver then handle RID/PASID difference internally, 
> based on its own configuration and mdev type.
> 
> To support above goal, I feel a new device handle is a nice fit 
> to represent mdev within IOMMU driver, with same set of 
> capabilities as observed on its parent device.

It's a good abstraction, but I'm still concerned about other users of
PASID-granular DMA isolation, for example GPU drivers wanting to improve
isolation of DMA bufs, will want the same functionality without going
through the vfio-mdev module.

The IOMMU operations we care about don't take a device handle, I think,
just a domain. And VFIO itself only deals with domains when doing
map/unmap. Maybe we could add this operation to the IOMMU subsystem:

child_domain = domain_create_child(parent_dev, parent_domain)

A child domain would be a smaller isolation granule, getting a PASID if
that's what the IOMMU implements or another mechanism for 2). It is
automatically attached to its parent's devices, so attach/detach
operations wouldn't be necessary on the child.

Depending on the underlying architecture the child domain can support
map/unmap on a single stage, or map/unmap for 2nd level and bind_pgtable
for 1st level.

I'm not sure how this works for host SVA though. I think the
sva_bind_dev() API could stay the same, but the internals will need
to change.

Thanks,
Jean

-----
I was going to send the following patch for dealing with private PASIDs.
I used it for a vfio-mdev prototype internally: VFIO would call
iommu_sva_alloc_pasid on the parent device and then sva_map/sva_unmap on
the parent domain + allocated io_mm (which redirects to
iommu_map/iommu_unmap when there is no io_mm). Since it relies on io_mm
instead of child domains, it duplicates the iommu ops, and as discussed
might not be adapted to Vt-d scalable mode.

Subject: [PATCH] iommu: sva: Add support for private PASIDs

Provide an API for allocating PASIDs and populating them manually. To ease
cleanup and factor allocation code, reuse the io_mm structure for private
PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
multiple devices. The mm_alloc() IOMMU op must now check if the mm
argument is NULL, in which case it should allocate io_pgtables instead of
binding to an mm.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 211 +++++++++++++++++++++++++++++++++-----
 drivers/iommu/iommu.c     |  49 ++++++---
 include/linux/iommu.h     | 118 +++++++++++++++++++--
 3 files changed, 331 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 35d0f387fbef..12c9b861014d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -15,11 +15,11 @@
 /**
  * DOC: io_mm model
  *
- * The io_mm keeps track of process address spaces shared between CPU and
IOMMU.
- * The following example illustrates the relation between structures
- * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between
io_mm and
- * device. A device can have multiple io_mm and an io_mm may be bound to
- * multiple devices.
+ * When used with the bind()/unbind() functions, the io_mm keeps track of
+ * process address spaces shared between CPU and IOMMU. The following example
+ * illustrates the relation between structures iommu_domain, io_mm and
+ * iommu_bond. An iommu_bond is a link between io_mm and device. A device can
+ * have multiple io_mm and an io_mm may be bound to multiple devices.
  *              ___________________________
  *             |  IOMMU domain A           |
  *             |  ________________         |
@@ -97,6 +97,12 @@
  * non-PASID translations. In this case PASID 0 is reserved and entry 0
points
  * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be
held in
  * the device table and PASID 0 would be available to the allocator.
+ *
+ * The io_mm can also represent a private IOMMU address space, which isn't
+ * shared with a process. The device driver calls iommu_sva_alloc_pasid which
+ * returns an io_mm that can be populated with the iommu_sva_map/unmap
+ * functions. The principle is the same as shared io_mm, except that a
private
+ * io_mm cannot be bound to multiple devices.
  */

 struct iommu_bond {
@@ -130,6 +136,9 @@ static DEFINE_SPINLOCK(iommu_sva_lock);

 static struct mmu_notifier_ops iommu_mmu_notifier;

+#define io_mm_is_private(io_mm) ((io_mm) != NULL && (io_mm)->mm == NULL)
+#define io_mm_is_shared(io_mm) ((io_mm) != NULL && (io_mm)->mm != NULL)
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm, unsigned long flags)
@@ -148,19 +157,10 @@ io_mm_alloc(struct iommu_domain *domain, struct
device *dev,
 	if (!io_mm)
 		return ERR_PTR(-ENOMEM);

-	/*
-	 * The mm must not be freed until after the driver frees the io_mm
-	 * (which may involve unpinning the CPU ASID for instance, requiring a
-	 * valid mm struct.)
-	 */
-	mmgrab(mm);
-
 	io_mm->flags		= flags;
 	io_mm->mm		= mm;
-	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
-	/* Leave kref to zero until the io_mm is fully initialized */

 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_sva_lock);
@@ -175,6 +175,32 @@ io_mm_alloc(struct iommu_domain *domain, struct
device *dev,
 		goto err_free_mm;
 	}

+	return io_mm;
+
+err_free_mm:
+	io_mm->release(io_mm);
+	return ERR_PTR(ret);
+}
+
+static struct io_mm *
+io_mm_alloc_shared(struct iommu_domain *domain, struct device *dev,
+		   struct mm_struct *mm, unsigned long flags)
+{
+	int ret;
+	struct io_mm *io_mm;
+
+	io_mm = io_mm_alloc(domain, dev, mm, flags);
+	if (IS_ERR(io_mm))
+		return io_mm;
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->notifier.ops = &iommu_mmu_notifier;
 	ret = mmu_notifier_register(&io_mm->notifier, mm);
 	if (ret)
 		goto err_free_pasid;
@@ -202,7 +228,6 @@ io_mm_alloc(struct iommu_domain *domain, struct device
*dev,
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);

-err_free_mm:
 	io_mm->release(io_mm);
 	mmdrop(mm);

@@ -231,6 +256,11 @@ static void io_mm_release(struct kref *kref)
 	/* The PASID can now be reallocated for another mm. */
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);

+	if (io_mm_is_private(io_mm)) {
+		io_mm->release(io_mm);
+		return;
+	}
+
 	/*
 	 * If we're being released from mm exit, the notifier callback ->release
 	 * has already been called. Otherwise we don't need ->release, the io_mm
@@ -258,7 +288,7 @@ static int io_mm_get_locked(struct io_mm *io_mm)
 	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
 		/*
 		 * kref_get_unless_zero doesn't provide ordering for reads. This
-		 * barrier pairs with the one in io_mm_alloc.
+		 * barrier pairs with the one in io_mm_alloc_shared.
 		 */
 		smp_rmb();
 		return 1;
@@ -289,7 +319,7 @@ static int io_mm_attach(struct iommu_domain *domain,
struct device *dev,
 	struct iommu_sva_param *param = dev->iommu_param->sva_param;

 	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
-	    !domain->ops->mm_invalidate)
+	    (io_mm_is_shared(io_mm) && !domain->ops->mm_invalidate))
 		return -ENODEV;

 	if (pasid > param->max_pasid || pasid < param->min_pasid)
@@ -550,7 +580,7 @@ int iommu_sva_device_init(struct device *dev, unsigned
long features,
 	struct iommu_sva_param *param;
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);

-	if (!domain || !domain->ops->sva_device_init)
+	if (!domain)
 		return -ENODEV;

 	if (features & ~IOMMU_SVA_FEAT_IOPF)
@@ -584,9 +614,11 @@ int iommu_sva_device_init(struct device *dev,
unsigned long features,
 	 * IOMMU driver updates the limits depending on the IOMMU and device
 	 * capabilities.
 	 */
-	ret = domain->ops->sva_device_init(dev, param);
-	if (ret)
-		goto err_free_param;
+	if (domain->ops->sva_device_init) {
+		ret = domain->ops->sva_device_init(dev, param);
+		if (ret)
+			goto err_free_param;
+	}

 	dev->iommu_param->sva_param = param;
 	mutex_unlock(&dev->iommu_param->lock);
@@ -688,7 +720,7 @@ int __iommu_sva_bind_device(struct device *dev, struct
mm_struct *mm,
 	}

 	if (!io_mm) {
-		io_mm = io_mm_alloc(domain, dev, mm, flags);
+		io_mm = io_mm_alloc_shared(domain, dev, mm, flags);
 		if (IS_ERR(io_mm))
 			return PTR_ERR(io_mm);
 	}
@@ -723,6 +755,9 @@ int __iommu_sva_unbind_device(struct device *dev, int
pasid)
 	/* spin_lock_irq matches the one in wait_event_lock_irq */
 	spin_lock_irq(&iommu_sva_lock);
 	list_for_each_entry(bond, &param->mm_list, dev_head) {
+		if (io_mm_is_private(bond->io_mm))
+			continue;
+
 		if (bond->io_mm->pasid == pasid) {
 			io_mm_detach_locked(bond, true);
 			ret = 0;
@@ -765,6 +800,136 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);

+/*
+ * iommu_sva_alloc_pasid - Allocate a private PASID
+ *
+ * Allocate a PASID for private map/unmap operations. Create a new I/O
address
+ * space for this device, that isn't bound to any process.
+ *
+ * iommu_sva_device_init must have been called first.
+ */
+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **out)
+{
+	int ret;
+	struct io_mm *io_mm;
+	struct iommu_domain *domain;
+	struct iommu_sva_param *param = dev->iommu_param->sva_param;
+
+	if (!out || !param)
+		return -EINVAL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	io_mm = io_mm_alloc(domain, dev, NULL, 0);
+	if (IS_ERR(io_mm))
+		return PTR_ERR(io_mm);
+
+	kref_init(&io_mm->kref);
+
+	ret = io_mm_attach(domain, dev, io_mm, NULL);
+	if (ret) {
+		io_mm_put(io_mm);
+		return ret;
+	}
+
+	*out = io_mm;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm)
+{
+	struct iommu_bond *bond;
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond, false);
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return __iommu_map(domain, io_mm, iova, paddr, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map);
+
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return -ENODEV;
+
+	return domain->ops->map_sg(domain, io_mm, iova, sg, nents, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map_sg);
+
+size_t iommu_sva_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, true);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap);
+
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size)
+{
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	return __iommu_unmap(domain, io_mm, iova, size, false);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap_fast);
+
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova)
+{
+	if (!io_mm)
+		return iommu_iova_to_phys(domain, iova);
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return 0;
+
+	if (unlikely(domain->ops->sva_iova_to_phys == NULL))
+		return 0;
+
+	return domain->ops->sva_iova_to_phys(domain, io_mm, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_iova_to_phys);
+
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm
*io_mm,
+			     unsigned long iova, size_t size)
+{
+	if (!io_mm) {
+		iommu_tlb_range_add(domain, iova, size);
+		return;
+	}
+
+	if (WARN_ON(io_mm_is_shared(io_mm)))
+		return;
+
+	if (domain->ops->sva_iotlb_range_add != NULL)
+		domain->ops->sva_iotlb_range_add(domain, io_mm, iova, size);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_tlb_range_add);
+
 /**
  * iommu_sva_find() - Find mm associated to the given PASID
  * @pasid: Process Address Space ID assigned to the mm
@@ -779,7 +944,7 @@ struct mm_struct *iommu_sva_find(int pasid)

 	spin_lock(&iommu_sva_lock);
 	io_mm = idr_find(&iommu_pasid_idr, pasid);
-	if (io_mm && io_mm_get_locked(io_mm)) {
+	if (io_mm_is_shared(io_mm) && io_mm_get_locked(io_mm)) {
 		if (mmget_not_zero(io_mm->mm))
 			mm = io_mm->mm;

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b02e1d32c733..26ac9b2a813e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1816,8 +1816,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 	return pgsize;
 }

-int iommu_map(struct iommu_domain *domain, unsigned long iova,
-	      phys_addr_t paddr, size_t size, int prot)
+int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		unsigned long iova, phys_addr_t paddr, size_t size, int prot)
 {
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
@@ -1825,7 +1825,8 @@ int iommu_map(struct iommu_domain *domain, unsigned
long iova,
 	phys_addr_t orig_paddr = paddr;
 	int ret = 0;

-	if (unlikely(domain->ops->map == NULL ||
+	if (unlikely((!io_mm && domain->ops->map == NULL) ||
+		     (io_mm && domain->ops->sva_map == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return -ENODEV;

@@ -1854,7 +1855,12 @@ int iommu_map(struct iommu_domain *domain, unsigned
long iova,
 		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 			 iova, &paddr, pgsize);

-		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		if (io_mm)
+			ret = domain->ops->sva_map(domain, io_mm, iova, paddr,
+						   pgsize, prot);
+		else
+			ret = domain->ops->map(domain, iova, paddr, pgsize,
+					       prot);
 		if (ret)
 			break;

@@ -1865,24 +1871,30 @@ int iommu_map(struct iommu_domain *domain,
unsigned long iova,

 	/* unroll mapping in case something went wrong */
 	if (ret)
-		iommu_unmap(domain, orig_iova, orig_size - size);
+		__iommu_unmap(domain, io_mm, orig_iova, orig_size - size, true);
 	else
 		trace_map(orig_iova, orig_paddr, orig_size);

 	return ret;
 }
+
+int iommu_map(struct iommu_domain *domain, unsigned long iov,
+	      phys_addr_t paddr, size_t size, int prot)
+{
+	return __iommu_map(domain, NULL, iov, paddr, size, prot);
+}
 EXPORT_SYMBOL_GPL(iommu_map);

-static size_t __iommu_unmap(struct iommu_domain *domain,
-			    unsigned long iova, size_t size,
-			    bool sync)
+size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+		     unsigned long iova, size_t size, bool sync)
 {
 	const struct iommu_ops *ops = domain->ops;
 	size_t unmapped_page, unmapped = 0;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;

-	if (unlikely(ops->unmap == NULL ||
+	if (unlikely((!io_mm && ops->unmap == NULL) ||
+		     (io_mm && ops->sva_unmap == NULL) ||
 		     domain->pgsize_bitmap == 0UL))
 		return 0;

@@ -1912,7 +1924,11 @@ static size_t __iommu_unmap(struct iommu_domain
*domain,
 	while (unmapped < size) {
 		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);

-		unmapped_page = ops->unmap(domain, iova, pgsize);
+		if (io_mm)
+			unmapped_page = ops->sva_unmap(domain, io_mm, iova,
+						       pgsize);
+		else
+			unmapped_page = ops->unmap(domain, iova, pgsize);
 		if (!unmapped_page)
 			break;

@@ -1936,19 +1952,20 @@ static size_t __iommu_unmap(struct iommu_domain
*domain,
 size_t iommu_unmap(struct iommu_domain *domain,
 		   unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, true);
+	return __iommu_unmap(domain, NULL, iova, size, true);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);

 size_t iommu_unmap_fast(struct iommu_domain *domain,
 			unsigned long iova, size_t size)
 {
-	return __iommu_unmap(domain, iova, size, false);
+	return __iommu_unmap(domain, NULL, iova, size, false);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap_fast);

-size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
-			 struct scatterlist *sg, unsigned int nents, int prot)
+size_t default_iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, struct scatterlist *sg, unsigned
+			    int nents, int prot)
 {
 	struct scatterlist *s;
 	size_t mapped = 0;
@@ -1972,7 +1989,7 @@ size_t default_iommu_map_sg(struct iommu_domain
*domain, unsigned long iova,
 		if (!IS_ALIGNED(s->offset, min_pagesz))
 			goto out_err;

-		ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
+		ret = __iommu_map(domain, io_mm, iova + mapped, phys, s->length, prot);
 		if (ret)
 			goto out_err;

@@ -1983,7 +2000,7 @@ size_t default_iommu_map_sg(struct iommu_domain
*domain, unsigned long iova,

 out_err:
 	/* undo mappings already done */
-	iommu_unmap(domain, iova, mapped);
+	__iommu_unmap(domain, io_mm, iova, mapped, true);

 	return 0;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b525f5636df8..1f63a8691173 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,13 +248,17 @@ struct iommu_sva_param {
  * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @sva_map: map a physically contiguous memory region to an address space
+ * @sva_unmap: unmap a physically contiguous memory region from an
address space
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
  *          to an iommu domain
  * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
  * @tlb_range_add: Add a given iova range to the flush queue for this domain
+ * @sva_iotlb_range_add: Add a given iova range to the flush queue for
this mm
  * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
  * @iova_to_phys: translate iova to physical address
+ * @sva_iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
  * @device_group: find iommu group for a particular device
@@ -302,13 +306,24 @@ struct iommu_ops {
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size);
-	size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
-			 struct scatterlist *sg, unsigned int nents, int prot);
+	int (*sva_map)(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
+	size_t (*sva_unmap)(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
+	size_t (*map_sg)(struct iommu_domain *domain, struct io_mm *io_mm,
+			 unsigned long iova, struct scatterlist *sg,
+			 unsigned int nents, int prot);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
 	void (*iotlb_range_add)(struct iommu_domain *domain,
 				unsigned long iova, size_t size);
+	void (*sva_iotlb_range_add)(struct iommu_domain *domain,
+				    struct io_mm *io_mm, unsigned long iova,
+				    size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+	phys_addr_t (*sva_iova_to_phys)(struct iommu_domain *domain,
+					struct io_mm *io_mm, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
 	void (*remove_device)(struct device *dev);
 	struct iommu_group *(*device_group)(struct device *dev);
@@ -528,15 +543,20 @@ extern int iommu_sva_invalidate(struct iommu_domain
*domain,
 		struct device *dev, struct tlb_invalidate_info *inv_info);

 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
+extern size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size, bool sync);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 			  size_t size);
 extern size_t iommu_unmap_fast(struct iommu_domain *domain,
 			       unsigned long iova, size_t size);
-extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned
long iova,
-				struct scatterlist *sg,unsigned int nents,
-				int prot);
+extern size_t default_iommu_map_sg(struct iommu_domain *domain, struct
io_mm *io_mm,
+				   unsigned long iova, struct scatterlist *sg,
+				   unsigned int nents, int prot);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
@@ -622,7 +642,7 @@ static inline size_t iommu_map_sg(struct iommu_domain
*domain,
 				  unsigned long iova, struct scatterlist *sg,
 				  unsigned int nents, int prot)
 {
-	return domain->ops->map_sg(domain, iova, sg, nents, prot);
+	return domain->ops->map_sg(domain, NULL, iova, sg, nents, prot);
 }

 /* PCI device grouping function */
@@ -710,12 +730,25 @@ static inline struct iommu_domain
*iommu_get_domain_for_dev(struct device *dev)
 	return NULL;
 }

+static inline int __iommu_map(struct iommu_domain *domain, struct io_mm
*io_mm,
+			      unsigned long iova, phys_addr_t paddr,
+			      size_t size, int prot)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, size_t size, int prot)
 {
 	return -ENODEV;
 }

+static inline size_t __iommu_unmap(struct iommu_domain *domain, struct
io_mm *io_mm,
+		     unsigned long iova, size_t size, bool sync)
+{
+	return 0;
+}
+
 static inline size_t iommu_unmap(struct iommu_domain *domain,
 				 unsigned long iova, size_t size)
 {
@@ -729,8 +762,9 @@ static inline size_t iommu_unmap_fast(struct
iommu_domain *domain,
 }

 static inline size_t iommu_map_sg(struct iommu_domain *domain,
-				  unsigned long iova, struct scatterlist *sg,
-				  unsigned int nents, int prot)
+				  struct io_mm *io_mm, unsigned long iova,
+				  struct scatterlist *sg, unsigned int nents,
+				  int prot)
 {
 	return 0;
 }
@@ -1017,6 +1051,22 @@ extern int __iommu_sva_bind_device(struct device
*dev, struct mm_struct *mm,
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);

+int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm);
+void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm);
+
+int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm,
+		  unsigned long iova, phys_addr_t paddr, size_t size, int prot);
+size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm,
+			unsigned long iova, struct scatterlist *sg,
+			unsigned int nents, int prot);
+size_t iommu_sva_unmap(struct iommu_domain *domain,
+		       struct io_mm *io_mm, unsigned long iova, size_t size);
+size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm,
+			    unsigned long iova, size_t size);
+phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+				   struct io_mm *io_mm, dma_addr_t iova);
+void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm
*io_mm,
+			     unsigned long iova, size_t size);
 extern struct mm_struct *iommu_sva_find(int pasid);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
@@ -1048,6 +1098,58 @@ static inline void
__iommu_sva_unbind_dev_all(struct device *dev)
 {
 }

+static inline struct io_mm *
+iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline void iommu_sva_free_pasid(struct io_mm *io_mm, struct
device *dev)
+{
+}
+
+static inline int iommu_sva_map(struct iommu_domain *domain,
+				struct io_mm *io_mm, unsigned long iova,
+				phys_addr_t paddr, size_t size, int prot)
+{
+	return -EINVAL;
+}
+
+static inline size_t iommu_sva_map_sg(struct iommu_domain *domain,
+				      struct io_mm *io_mm, unsigned long iova,
+				      struct scatterlist *sg,
+				      unsigned int nents, int prot)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap(struct iommu_domain *domain,
+				     struct io_mm *io_mm, unsigned long iova,
+				     size_t size)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap_fast(struct iommu_domain *domain,
+					  struct io_mm *io_mm,
+					  unsigned long iova, size_t size)
+{
+	return 0;
+}
+
+static inline phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain,
+						 struct io_mm *io_mm,
+						 dma_addr_t iova)
+{
+	return 0;
+}
+
+static inline void iommu_sva_tlb_range_add(struct iommu_domain *domain,
+					   struct io_mm *io_mm,
+					   unsigned long iova, size_t size)
+{
+}
+
 static inline struct mm_struct *iommu_sva_find(int pasid)
 {
 	return NULL;
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-25 19:19             ` Jean-Philippe Brucker
@ 2018-07-26  3:03               ` Tian, Kevin
  2018-07-26 15:09                 ` Jean-Philippe Brucker
       [not found]               ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2018-07-26  3:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Lu Baolu, Liu, Yi L, Joerg Roedel,
	David Woodhouse, Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, July 26, 2018 3:20 AM
> 
> On 25/07/18 07:19, Tian, Kevin wrote:
> >> From: Tian, Kevin
> >> Sent: Wednesday, July 25, 2018 10:16 AM
> >>
> > [...]
> >>>
> >>> There is another way: as we're planning to add a generic pasid_alloc()
> >>> function to the IOMMU API, the mdev module itself could allocate a
> >>> default PASID for each mdev by calling pasid_alloc() on the mdev's
> >>> parent, and then do map()/unmap() with that PASID. This way we don't
> >>> have to add IOMMU ops to the mdev bus, everything can still be done
> >>> using the ops of the parent. And IOMMU drivers "only" have to
> >> implement
> >>> PASID ops, which will be reused by drivers other than mdev.
> >>
> >> yes, PASID is more general abstraction than mdev. However there is
> >> one problem. Please note with new scalable mode now PASID is not
> >> used just for SVA. You can do 1st-level, 2nd-level and nested in PASID
> >> granularity, meaning each PASID can be bound to an unique iommu
> >> domain. However today IOMMU driver allows only one iommu_domain
> >> per device. If we simply install allocated PASID to parent device, we
> >> should also remove that iommu_domain limitation. Is it the way that
> >> we want to pursue?
> >>
> >> mdev avoids such problem as it introduces a separate device...
> >
> > another thing to think about is to simplify the caller logic. It would
> > be good to have consistent IOMMU APIs cross PCI device and
> > mdev, for common VFIO operations e.g. map/unmap, iommu_
> > attach_group, etc. If we need special version ops with PASID, it
> > brings burden to VFIO and other IOMMU clients...
> 
> Yes, providing special ops is the simplest to implement (or least
> invasive, I guess) but isn't particularly elegant. See the patch from
> Jordan below, that I was planning to add to the SVA series (I'm
> laboriously working towards next version) and my email from last week:
> https://patchwork.kernel.org/patch/10412251/
> 
> Whenever I come back to hierarchical IOMMU domains I reject it as too
> complicated, but maybe that is what we need. I find it difficult to
> reason about because domains currently represent both a collection of
> devices and a one or more address spaces. I proposed the io_mm thing to
> represent a single address space, and to avoid adding special cases to
> every function in the IOMMU subsystem that manipulates domains.

I suppose io_mm is still under iommu_domain, right? this is different
from hierarchical iommu domain concept...

> 
> > For IOMMU-capable mdev, there are two use cases which have
> > been talked so far:
> >
> > 1) mdev with PASID-granular DMA isolation
> > 2) mdev inheriting RID from parent device (no sharing)
> 
> Would that be a single mdev per parent device? Otherwise I don't really
> understand how it would work without all map/unmap operations going to
> the parent device's driver.

yes, that is why I said "no sharing". In that special case, host driver 
itself doesn't do DMA. Only the mdev does. 

> 
> > 1) is what this RFC is currently for. 2) is what Alex concerned
> > which is not covered in RFC yet.
> >
> > In my mind, an ideal design is like below:
> >
> > a) VFIO provides an interface for parent driver to specify
> > whether a mdev is IOMMU capable, with flag to indicate
> > whether it's PASID-granular or RID-inherit;
> >
> > b) Once an IOMMU-capable mdev is created, VFIO treats it no
> > different from normal physical devices, using exactly same
> > IOMMU APIs to operate (including future SVA). VFIO doesn't
> > care whether PASID or RID will be used in hardware;
> >
> > c) IOMMU driver then handle RID/PASID difference internally,
> > based on its own configuration and mdev type.
> >
> > To support above goal, I feel a new device handle is a nice fit
> > to represent mdev within IOMMU driver, with same set of
> > capabilities as observed on its parent device.
> 
> It's a good abstraction, but I'm still concerned about other users of
> PASID-granular DMA isolation, for example GPU drivers wanting to improve
> isolation of DMA bufs, will want the same functionality without going
> through the vfio-mdev module.

for GPU I think you meant SVA. that one anyway needs its own 
interface as what we have been discussing in yours and Jacob's
series.

Here mdev is orthogonal to a specific capability like SVA. It is 
sort of logical representation of subset resource of parent device,
on top of which we can enable IOMMU capabilities including SVA.

I'm not sure whether it is good to combine two requirements closely...

> 
> The IOMMU operations we care about don't take a device handle, I think,
> just a domain. And VFIO itself only deals with domains when doing
> map/unmap. Maybe we could add this operation to the IOMMU subsystem:
> 
> child_domain = domain_create_child(parent_dev, parent_domain)
> 
> A child domain would be a smaller isolation granule, getting a PASID if
> that's what the IOMMU implements or another mechanism for 2). It is
> automatically attached to its parent's devices, so attach/detach
> operations wouldn't be necessary on the child.
> 
> Depending on the underlying architecture the child domain can support
> map/unmap on a single stage, or map/unmap for 2nd level and
> bind_pgtable
> for 1st level.
> 
> I'm not sure how this works for host SVA though. I think the
> sva_bind_dev() API could stay the same, but the internals will need
> to change.
> 

hierarchical domain might be the right way to go, but let's do more
thinking on any corner cases. 

One open is whether iommu domain can fully carry all required 
attributes on mdev. Note today for physical device each vendor 
driver maintains a device structure for device specific info which 
may impact IOMMU setting (e.g. struct device_domain_info in 
intel-iommu, and struct arm_smmu_device in arm-smmu). If we 
want mdev to have a different attribute as its parent device, then 
new representation might be required. But honestly speaking I 
don't think it is a valid requirement now, since physically finally
it is still the IOMMU structure of parent device being configured,
so mdev should just inherit same attributes as parent. If the role 
of mdev representation in current RFC is just to connect iommu_
domain when hierarchical domain is missing, then we might 
instead just make the latter happen...

Let's think more on this direction. btw can you elaborate any 
other complexities when you evaluated this option earlier?

Thanks
Kevin


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
       [not found]               ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com>
@ 2018-07-26  3:28                 ` Tian, Kevin
  2018-07-26 15:09                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2018-07-26  3:28 UTC (permalink / raw)
  To: 'Jean-Philippe Brucker',
	Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

> From: Tian, Kevin
> Sent: Thursday, July 26, 2018 11:04 AM
[...]
> >
> > The IOMMU operations we care about don't take a device handle, I think,
> > just a domain. And VFIO itself only deals with domains when doing
> > map/unmap. Maybe we could add this operation to the IOMMU
> subsystem:
> >
> > child_domain = domain_create_child(parent_dev, parent_domain)
> >
> > A child domain would be a smaller isolation granule, getting a PASID if
> > that's what the IOMMU implements or another mechanism for 2). It is
> > automatically attached to its parent's devices, so attach/detach
> > operations wouldn't be necessary on the child.
> >
> > Depending on the underlying architecture the child domain can support
> > map/unmap on a single stage, or map/unmap for 2nd level and
> > bind_pgtable
> > for 1st level.
> >
> > I'm not sure how this works for host SVA though. I think the
> > sva_bind_dev() API could stay the same, but the internals will need
> > to change.
> >
> 
> hierarchical domain might be the right way to go, but let's do more
> thinking on any corner cases.
> 

btw maybe we don't need make it 'hierarchical', as maintaining
hierarchy usually brings more work. What we require is possibly
just the capability of having one device bind to multiple 
iommu_domains. One domain is reserved for parent driver's
own DMA activities (e.g. serving DMA APIs), while other domains
are auxiliary and can be tagged with a PASID (or any other identifier
which IOMMU can use to support multiple domains). Such identifiers 
may be automatically provisioned when auxiliary domain is attached, 
i.e. not requiring an explicit request from caller. IMO it's safe to 
assume that supporting multiple iommu domains anyway implies 
some finer-grained capability than RID-based in underlying IOMMU.
Then there is no need of parent/child concept.

thoughts?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-26  3:03               ` Tian, Kevin
@ 2018-07-26 15:09                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 32+ messages in thread
From: Jean-Philippe Brucker @ 2018-07-26 15:09 UTC (permalink / raw)
  To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

On 26/07/18 04:03, Tian, Kevin wrote:
>> Whenever I come back to hierarchical IOMMU domains I reject it as too
>> complicated, but maybe that is what we need. I find it difficult to
>> reason about because domains currently represent both a collection of
>> devices and a one or more address spaces. I proposed the io_mm thing to
>> represent a single address space, and to avoid adding special cases to
>> every function in the IOMMU subsystem that manipulates domains.
> 
> I suppose io_mm is still under iommu_domain, right? this is different
> from hierarchical iommu domain concept...

Yes, my initial solution is io_mm attached to domains, but in the rest
of the mail yesterday I tried to explore a way to replace io_mm with
child domains instead. In my current patches, io_mm when used for
private PASIDs is basically a lightweight child domain. I don't know
which solution is best or if mdev-aware IOMMU is still preferable. For
the moment I'll keep the private PASID proposal that uses io_mm, until
we've thought a bit more about this.

[...]
>> It's a good abstraction, but I'm still concerned about other users of
>> PASID-granular DMA isolation, for example GPU drivers wanting to improve
>> isolation of DMA bufs, will want the same functionality without going
>> through the vfio-mdev module.
> 
> for GPU I think you meant SVA. that one anyway needs its own 
> interface as what we have been discussing in yours and Jacob's
> series.

Actually I didn't mean SVA. People would like to allocate PASIDs in
kernel drivers and do iommu_map/unmap on them, without binding process
address spaces. Not counting hardware validation, I've actually seen
more demand for private PASID management than for SVA.

See for example the discussion on RFCv2 of SVA, or Jordan's series:
https://www.spinics.net/lists/arm-kernel/msg611038.html
https://lwn.net/Articles/747889/

It would be good if mdev and non-mdev could reuse most of the same code
for private PASID, whatever it turns out to be

> Here mdev is orthogonal to a specific capability like SVA. It is 
> sort of logical representation of subset resource of parent device,
> on top of which we can enable IOMMU capabilities including SVA.
> 
> I'm not sure whether it is good to combine two requirements closely...
> 
>>
>> The IOMMU operations we care about don't take a device handle, I think,
>> just a domain. And VFIO itself only deals with domains when doing
>> map/unmap. Maybe we could add this operation to the IOMMU subsystem:
>>
>> child_domain = domain_create_child(parent_dev, parent_domain)
>>
>> A child domain would be a smaller isolation granule, getting a PASID if
>> that's what the IOMMU implements or another mechanism for 2). It is
>> automatically attached to its parent's devices, so attach/detach
>> operations wouldn't be necessary on the child.
>>
>> Depending on the underlying architecture the child domain can support
>> map/unmap on a single stage, or map/unmap for 2nd level and
>> bind_pgtable
>> for 1st level.
>>
>> I'm not sure how this works for host SVA though. I think the
>> sva_bind_dev() API could stay the same, but the internals will need
>> to change.

Thinking more about this, the SVA case seems a bit nasty. If we replaced
io_mm with iommu_domain for SVA, a child domain would represent a single
process address space, and therefore have multiple parent domains... Not
sure we should go down that road. Maybe we should keep SVA separate, and
keep io_mm to be a wrapper for mm_struct


> hierarchical domain might be the right way to go, but let's do more
> thinking on any corner cases. 
> 
> One open is whether iommu domain can fully carry all required 
> attributes on mdev. Note today for physical device each vendor 
> driver maintains a device structure for device specific info which 
> may impact IOMMU setting (e.g. struct device_domain_info in 
> intel-iommu, and struct arm_smmu_device in arm-smmu). If we 
> want mdev to have a different attribute as its parent device, then 
> new representation might be required. But honestly speaking I 
> don't think it is a valid requirement now, since physically finally
> it is still the IOMMU structure of parent device being configured,
> so mdev should just inherit same attributes as parent. If the role 
> of mdev representation in current RFC is just to connect iommu_
> domain when hierarchical domain is missing, then we might 
> instead just make the latter happen...

Agreed, the mdev would inherit properties of its parent since physically
the IOMMU sees DMA transactions coming from the parent

> Let's think more on this direction. btw can you elaborate any 
> other complexities when you evaluated this option earlier?

I can't think of anything right now - my prototype worked well with
iommu_sva_alloc_pasid, but the goal was mainly for me to understand mdev
using a toy DMA engine, I didn't spend time thinking about corner cases.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices
  2018-07-26  3:28                 ` Tian, Kevin
@ 2018-07-26 15:09                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 32+ messages in thread
From: Jean-Philippe Brucker @ 2018-07-26 15:09 UTC (permalink / raw)
  To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse,
	Alex Williamson, Kirti Wankhede
  Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y,
	Pan, Jacob jun

On 26/07/18 04:28, Tian, Kevin wrote:
>> hierarchical domain might be the right way to go, but let's do more
>> thinking on any corner cases.
>>
> 
> btw maybe we don't need make it 'hierarchical', as maintaining
> hierarchy usually brings more work. What we require is possibly
> just the capability of having one device bind to multiple 
> iommu_domains. One domain is reserved for parent driver's
> own DMA activities (e.g. serving DMA APIs), while other domains
> are auxiliary and can be tagged with a PASID (or any other identifier
> which IOMMU can use to support multiple domains). Such identifiers 
> may be automatically provisioned when auxiliary domain is attached, 
> i.e. not requiring an explicit request from caller. IMO it's safe to 
> assume that supporting multiple iommu domains anyway implies 
> some finer-grained capability than RID-based in underlying IOMMU.
> Then there is no need of parent/child concept.

Right, we probably don't need a hierarchy. I like this model (it's
actually the one I favor for supporting PASID in virtio-iommu), though
I'm hoping we can avoid changing the logic of iommu_attach/detach, and
instead introduce a new "get_child_domain", "attach_aux_domain" or
simply "clone" op.

Currently the attach_dev() op toggles the domain of a device. For
example a device automatically gets a default domain for kernel DMA,
then VFIO attaches a new domain for assigning to a VM. attach_dev()
disables the default domain and installs fresh page tables. detach_dev()
isn't called, and the SMMU drivers don't even implement it. If we wanted
to reuse attach_dev() for auxiliary domains, we'd need some flag to
differentiate the "add auxiliary domain" operation from the "detach
everything and attach one new domain" one

Thanks,
Jean

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-07-26 15:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-22  6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu
2018-07-23  4:44   ` Liu, Yi L
2018-07-24  2:06     ` Lu Baolu
2018-07-24 18:50   ` Alex Williamson
2018-07-25  1:18     ` Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu
2018-07-23  4:44   ` Liu, Yi L
2018-07-24  2:09     ` Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu
2018-07-23  4:44   ` Liu, Yi L
2018-07-24  2:22     ` Lu Baolu
2018-07-24 11:30       ` Jean-Philippe Brucker
2018-07-24 19:51         ` Alex Williamson
2018-07-25  2:06         ` Lu Baolu
2018-07-25  2:16         ` Tian, Kevin
2018-07-25  2:35         ` Liu, Yi L
     [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com>
2018-07-25  6:19           ` Tian, Kevin
2018-07-25 19:19             ` Jean-Philippe Brucker
2018-07-26  3:03               ` Tian, Kevin
2018-07-26 15:09                 ` Jean-Philippe Brucker
     [not found]               ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com>
2018-07-26  3:28                 ` Tian, Kevin
2018-07-26 15:09                   ` Jean-Philippe Brucker
2018-07-22  6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu
2018-07-23  4:44   ` Liu, Yi L
2018-07-24  2:29     ` Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu
2018-07-22  6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).