All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
@ 2017-04-26 10:11 ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker

Hi,

This patchset introduces SVM virtualization for intel_iommu in
IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
Qemu/IOMMU/VFIO.

Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
Extend intel_iommu emulator to support Shared Virtual Memory"

In this patchset, it adds two new IOMMU APIs and their implementation
in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
container->fd to propagate data from QEMU to kernel space.

[Patch Overview]
* 1 adds iommu API definition for binding guest PASID table
* 2 adds binding PASID table API implementation in VT-d iommu driver
* 3 adds iommu API definition to do IOMMU TLB invalidation from guest
* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
* 5 adds VFIO IOCTL for propagating PASID table binding from guest
* 6 adds processing of pasid table binding in vfio_iommu_type1
* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1

Best Wishes,
Yi L


Jacob Pan (3):
  iommu: Introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: Add iommu do invalidate function

Liu, Yi L (5):
  iommu: Introduce iommu do invalidate API function
  VFIO: Add new IOTCL for PASID Table bind propagation
  VFIO: do pasid table binding
  VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  VFIO: do IOMMU TLB invalidation from guest

 drivers/iommu/intel-iommu.c     | 146 ++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c           |  32 +++++++++
 drivers/vfio/vfio_iommu_type1.c |  98 +++++++++++++++++++++++++++
 include/linux/dma_remapping.h   |   1 +
 include/linux/intel-iommu.h     |  11 +++
 include/linux/iommu.h           |  47 +++++++++++++
 include/uapi/linux/vfio.h       |  26 +++++++
 7 files changed, 361 insertions(+)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
@ 2017-04-26 10:11 ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker

Hi,

This patchset introduces SVM virtualization for intel_iommu in
IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
Qemu/IOMMU/VFIO.

Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
Extend intel_iommu emulator to support Shared Virtual Memory"

In this patchset, it adds two new IOMMU APIs and their implementation
in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
container->fd to propagate data from QEMU to kernel space.

[Patch Overview]
* 1 adds iommu API definition for binding guest PASID table
* 2 adds binding PASID table API implementation in VT-d iommu driver
* 3 adds iommu API definition to do IOMMU TLB invalidation from guest
* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
* 5 adds VFIO IOCTL for propagating PASID table binding from guest
* 6 adds processing of pasid table binding in vfio_iommu_type1
* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1

Best Wishes,
Yi L


Jacob Pan (3):
  iommu: Introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: Add iommu do invalidate function

Liu, Yi L (5):
  iommu: Introduce iommu do invalidate API function
  VFIO: Add new IOTCL for PASID Table bind propagation
  VFIO: do pasid table binding
  VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  VFIO: do IOMMU TLB invalidation from guest

 drivers/iommu/intel-iommu.c     | 146 ++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c           |  32 +++++++++
 drivers/vfio/vfio_iommu_type1.c |  98 +++++++++++++++++++++++++++
 include/linux/dma_remapping.h   |   1 +
 include/linux/intel-iommu.h     |  11 +++
 include/linux/iommu.h           |  47 +++++++++++++
 include/uapi/linux/vfio.h       |  26 +++++++
 7 files changed, 361 insertions(+)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:11   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
case in the guest:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

As part of the proposed architecture, when a SVM capable PCI
device is assigned to a guest, nested mode is turned on. Guest owns the
first level page tables (request with PASID) and performs GVA->GPA
translation. Second level page tables are owned by the host for GPA->HPA
translation for both request with and without PASID.

A new IOMMU driver interface is therefore needed to perform tasks as
follows:
* Enable nested translation and appropriate translation type
* Assign guest PASID table pointer (in GPA) and size to host IOMMU

This patch introduces new functions called iommu_(un)bind_pasid_table()
to IOMMU APIs. Architecture specific IOMMU function can be added later
to perform the specific steps for binding pasid table of assigned devices.

This patch also adds model definition in iommu.h. It would be used to
check if the bind request is from a compatible entity. e.g. a bind
request from an intel_iommu emulator may not be supported by an ARM SMMU
driver.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/iommu.c | 19 +++++++++++++++++++
 include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbe7f65..f2da636 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	if (unlikely(!domain->ops->bind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
+
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	if (unlikely(!domain->ops->unbind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->unbind_pasid_table(domain, dev);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0ff5111..491a011 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -131,6 +131,15 @@ struct iommu_dm_region {
 	int			prot;
 };
 
+struct pasid_table_info {
+	__u64	ptr;	/* PASID table ptr */
+	__u64	size;	/* PASID table size*/
+	__u32	model;	/* magic number */
+#define INTEL_IOMMU	(1 << 0)
+#define ARM_SMMU	(1 << 1)
+	__u8	opaque[];/* IOMMU-specific details */
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -159,6 +168,8 @@ struct iommu_dm_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @bind_pasid_table: bind pasid table pointer for guest SVM
+ * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -200,6 +211,10 @@ struct iommu_ops {
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
 
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
+				struct pasid_table_info *pasidt_binfo);
+	int (*unbind_pasid_table)(struct iommu_domain *domain,
+				struct device *dev);
 
 	unsigned long pgsize_bitmap;
 };
@@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo);
+extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
+				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	return -EINVAL;
+}
+static inline
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-26 10:11   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
case in the guest:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

As part of the proposed architecture, when a SVM capable PCI
device is assigned to a guest, nested mode is turned on. Guest owns the
first level page tables (request with PASID) and performs GVA->GPA
translation. Second level page tables are owned by the host for GPA->HPA
translation for both request with and without PASID.

A new IOMMU driver interface is therefore needed to perform tasks as
follows:
* Enable nested translation and appropriate translation type
* Assign guest PASID table pointer (in GPA) and size to host IOMMU

This patch introduces new functions called iommu_(un)bind_pasid_table()
to IOMMU APIs. Architecture specific IOMMU function can be added later
to perform the specific steps for binding pasid table of assigned devices.

This patch also adds model definition in iommu.h. It would be used to
check if the bind request is from a compatible entity. e.g. a bind
request from an intel_iommu emulator may not be supported by an ARM SMMU
driver.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/iommu.c | 19 +++++++++++++++++++
 include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbe7f65..f2da636 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	if (unlikely(!domain->ops->bind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
+
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	if (unlikely(!domain->ops->unbind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->unbind_pasid_table(domain, dev);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0ff5111..491a011 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -131,6 +131,15 @@ struct iommu_dm_region {
 	int			prot;
 };
 
+struct pasid_table_info {
+	__u64	ptr;	/* PASID table ptr */
+	__u64	size;	/* PASID table size*/
+	__u32	model;	/* magic number */
+#define INTEL_IOMMU	(1 << 0)
+#define ARM_SMMU	(1 << 1)
+	__u8	opaque[];/* IOMMU-specific details */
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -159,6 +168,8 @@ struct iommu_dm_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @bind_pasid_table: bind pasid table pointer for guest SVM
+ * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -200,6 +211,10 @@ struct iommu_ops {
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
 
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
+				struct pasid_table_info *pasidt_binfo);
+	int (*unbind_pasid_table)(struct iommu_domain *domain,
+				struct device *dev);
 
 	unsigned long pgsize_bitmap;
 };
@@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo);
+extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
+				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	return -EINVAL;
+}
+static inline
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:11   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c   | 103 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 646756c..6d5b939 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 
 	return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo)
+{
+	struct intel_iommu *iommu;
+	struct context_entry *context;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	u8 bus, devfn;
+	u16 did, *sid;
+	int ret = 0;
+	unsigned long flags;
+	u64 ctx_lo;
+
+	if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) {
+		pr_warn("%s: Invalid bind request!\n", __func__);
+		return -EINVAL;
+	}
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	sid = (u16 *)&pasidt_binfo->opaque;
+	/* check SID, if it is not correct, return */
+	if (PCI_DEVID(bus, devfn) != *sid)
+		return 0;
+
+	info = dev->archdata.iommu;
+	if (!info || !info->pasid_supported) {
+		pr_err("Device %d:%d.%d has no pasid support\n", bus,
+			PCI_SLOT(devfn), PCI_FUNC(devfn));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
+		pr_err("Invalid gPASID table size %llu, host size %lu\n",
+			pasidt_binfo->size,
+			intel_iommu_get_pts(iommu));
+		ret = -EINVAL;
+		goto out;
+	}
+	spin_lock_irqsave(&iommu->lock, flags);
+	context = iommu_context_addr(iommu, bus, devfn, 0);
+	if (!context || !context_present(context)) {
+		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
+			__func__, bus, devfn);
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		goto out;
+	}
+	/* Anticipate guest to use SVM and owns the first level */
+	ctx_lo = context[0].lo;
+	ctx_lo |= CONTEXT_NESTE;
+	ctx_lo |= CONTEXT_PRS;
+	ctx_lo |= CONTEXT_PASIDE;
+	ctx_lo &= ~CONTEXT_TT_MASK;
+	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+	context[0].lo = ctx_lo;
+
+	/* Assign guest PASID table pointer and size */
+	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
+	context[1].lo = ctx_lo;
+	/* make sure context entry is updated before flushing */
+	wmb();
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	iommu->flush.flush_context(iommu, did,
+				(((u16)bus) << 8) | devfn,
+				DMA_CCMD_MASK_NOBIT,
+				DMA_CCMD_DEVICE_INVL);
+	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+
+out:
+	return ret;
+}
+
+static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u8 bus, devfn;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+	/*
+	 * REVISIT: we might want to clear the PASID table pointer
+	 * as part of context clear operation. Currently, it leaves
+	 * stale data but should be ignored by hardware since PASIDE
+	 * is clear.
+	 */
+	/* ATS will be reenabled when remapping is restored */
+	pci_disable_ats(to_pci_dev(dev));
+	domain_context_clear(iommu, dev);
+	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
+}
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 static const struct iommu_ops intel_iommu_ops = {
@@ -5314,6 +5413,10 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 	.domain_free	= intel_iommu_domain_free,
 	.attach_dev	= intel_iommu_attach_device,
 	.detach_dev	= intel_iommu_detach_device,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.bind_pasid_table	= intel_iommu_bind_pasid_table,
+	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+#endif
 	.map		= intel_iommu_map,
 	.unmap		= intel_iommu_unmap,
 	.map_sg		= default_iommu_map_sg,
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 187c102..c03b62a 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -27,6 +27,7 @@
 
 #define CONTEXT_DINVE		(1ULL << 8)
 #define CONTEXT_PRS		(1ULL << 9)
+#define CONTEXT_NESTE		(1ULL << 10)
 #define CONTEXT_PASIDE		(1ULL << 11)
 
 struct intel_iommu;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
@ 2017-04-26 10:11   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:11 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c   | 103 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 646756c..6d5b939 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 
 	return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo)
+{
+	struct intel_iommu *iommu;
+	struct context_entry *context;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	u8 bus, devfn;
+	u16 did, *sid;
+	int ret = 0;
+	unsigned long flags;
+	u64 ctx_lo;
+
+	if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) {
+		pr_warn("%s: Invalid bind request!\n", __func__);
+		return -EINVAL;
+	}
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	sid = (u16 *)&pasidt_binfo->opaque;
+	/* check SID, if it is not correct, return */
+	if (PCI_DEVID(bus, devfn) != *sid)
+		return 0;
+
+	info = dev->archdata.iommu;
+	if (!info || !info->pasid_supported) {
+		pr_err("Device %d:%d.%d has no pasid support\n", bus,
+			PCI_SLOT(devfn), PCI_FUNC(devfn));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
+		pr_err("Invalid gPASID table size %llu, host size %lu\n",
+			pasidt_binfo->size,
+			intel_iommu_get_pts(iommu));
+		ret = -EINVAL;
+		goto out;
+	}
+	spin_lock_irqsave(&iommu->lock, flags);
+	context = iommu_context_addr(iommu, bus, devfn, 0);
+	if (!context || !context_present(context)) {
+		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
+			__func__, bus, devfn);
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		goto out;
+	}
+	/* Anticipate guest to use SVM and owns the first level */
+	ctx_lo = context[0].lo;
+	ctx_lo |= CONTEXT_NESTE;
+	ctx_lo |= CONTEXT_PRS;
+	ctx_lo |= CONTEXT_PASIDE;
+	ctx_lo &= ~CONTEXT_TT_MASK;
+	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+	context[0].lo = ctx_lo;
+
+	/* Assign guest PASID table pointer and size */
+	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
+	context[1].lo = ctx_lo;
+	/* make sure context entry is updated before flushing */
+	wmb();
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	iommu->flush.flush_context(iommu, did,
+				(((u16)bus) << 8) | devfn,
+				DMA_CCMD_MASK_NOBIT,
+				DMA_CCMD_DEVICE_INVL);
+	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+
+out:
+	return ret;
+}
+
+static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u8 bus, devfn;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+	/*
+	 * REVISIT: we might want to clear the PASID table pointer
+	 * as part of context clear operation. Currently, it leaves
+	 * stale data but should be ignored by hardware since PASIDE
+	 * is clear.
+	 */
+	/* ATS will be reenabled when remapping is restored */
+	pci_disable_ats(to_pci_dev(dev));
+	domain_context_clear(iommu, dev);
+	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
+}
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 static const struct iommu_ops intel_iommu_ops = {
@@ -5314,6 +5413,10 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 	.domain_free	= intel_iommu_domain_free,
 	.attach_dev	= intel_iommu_attach_device,
 	.detach_dev	= intel_iommu_detach_device,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.bind_pasid_table	= intel_iommu_bind_pasid_table,
+	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+#endif
 	.map		= intel_iommu_map,
 	.unmap		= intel_iommu_unmap,
 	.map_sg		= default_iommu_map_sg,
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 187c102..c03b62a 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -27,6 +27,7 @@
 
 #define CONTEXT_DINVE		(1ULL << 8)
 #define CONTEXT_PRS		(1ULL << 9)
+#define CONTEXT_NESTE		(1ULL << 10)
 #define CONTEXT_PASIDE		(1ULL << 11)
 
 struct intel_iommu;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L,
	Jacob Pan

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

When a SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before pass them down the
host/physical IOMMU. There are IOMMU architectural specific actions
need to be taken which requires the generic APIs introduced in this
patch to have opaque data in the tlb_invalidate_info argument.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/iommu.c | 13 +++++++++++++
 include/linux/iommu.h | 16 ++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2da636..ca7cff2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	int ret = 0;
+
+	if (unlikely(domain->ops->do_invalidate == NULL))
+		return -ENODEV;
+
+	ret = domain->ops->do_invalidate(domain, dev, inv_info);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_do_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 491a011..a48e3b75 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -140,6 +140,11 @@ struct pasid_table_info {
 	__u8	opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+	__u32	model;
+	__u8	opaque[];
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -215,6 +220,8 @@ struct iommu_ops {
 				struct pasid_table_info *pasidt_binfo);
 	int (*unbind_pasid_table)(struct iommu_domain *domain,
 				struct device *dev);
+	int (*do_invalidate)(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
 
 	unsigned long pgsize_bitmap;
 };
@@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
 		struct device *dev, struct pasid_table_info *pasidt_binfo);
 extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 	return -EINVAL;
 }
 
+static inline int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L,
	Jacob Pan

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

When a SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before pass them down the
host/physical IOMMU. There are IOMMU architectural specific actions
need to be taken which requires the generic APIs introduced in this
patch to have opaque data in the tlb_invalidate_info argument.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/iommu.c | 13 +++++++++++++
 include/linux/iommu.h | 16 ++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2da636..ca7cff2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	int ret = 0;
+
+	if (unlikely(domain->ops->do_invalidate == NULL))
+		return -ENODEV;
+
+	ret = domain->ops->do_invalidate(domain, dev, inv_info);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_do_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 491a011..a48e3b75 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -140,6 +140,11 @@ struct pasid_table_info {
 	__u8	opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+	__u32	model;
+	__u8	opaque[];
+};
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -215,6 +220,8 @@ struct iommu_ops {
 				struct pasid_table_info *pasidt_binfo);
 	int (*unbind_pasid_table)(struct iommu_domain *domain,
 				struct device *dev);
+	int (*do_invalidate)(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
 
 	unsigned long pgsize_bitmap;
 };
@@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
 		struct device *dev, struct pasid_table_info *pasidt_binfo);
 extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 	return -EINVAL;
 }
 
+static inline int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

This patch adds Intel VT-d specific function to implement
iommu_do_invalidate API.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h | 11 +++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d5b939..0b098ad 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
 }
 
+static int intel_iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	int ret = 0;
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_invalidate_data *inv_data;
+	struct qi_desc *qi;
+	u16 did;
+	u8 bus, devfn;
+
+	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
+		return -EINVAL;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
+
+	/* check SID */
+	if (PCI_DEVID(bus, devfn) != inv_data->sid)
+		return 0;
+
+	qi = &inv_data->inv_desc;
+
+	switch (qi->low & QI_TYPE_MASK) {
+	case QI_DIOTLB_TYPE:
+	case QI_DEIOTLB_TYPE:
+		/* for device IOTLB, we just let it pass through */
+		break;
+	default:
+		did = dmar_domain->iommu_did[iommu->seq_id];
+		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
+		break;
+	}
+
+	ret = qi_submit_sync(qi, iommu);
+
+	return ret;
+}
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot)
@@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.bind_pasid_table	= intel_iommu_bind_pasid_table,
 	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+	.do_invalidate		= intel_iommu_do_invalidate,
 #endif
 	.map		= intel_iommu_map,
 	.unmap		= intel_iommu_unmap,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ac04f28..9d6562c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -29,6 +29,7 @@
 #include <linux/dma_remapping.h>
 #include <linux/mmu_notifier.h>
 #include <linux/list.h>
+#include <linux/bitops.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
 
@@ -271,6 +272,10 @@ enum {
 #define QI_PGRP_RESP_TYPE	0x9
 #define QI_PSTRM_RESP_TYPE	0xa
 
+#define QI_DID(did)		(((u64)did & 0xffff) << 16)
+#define QI_DID_MASK		GENMASK(31, 16)
+#define QI_TYPE_MASK		GENMASK(3, 0)
+
 #define QI_IEC_SELECTIVE	(((u64)1) << 4)
 #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
 #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
@@ -529,6 +534,12 @@ struct intel_svm {
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
 #endif
 
+struct intel_invalidate_data {
+	u16 sid;
+	u32 pasid;
+	struct qi_desc inv_desc;
+};
+
 extern const struct attribute_group *intel_iommu_groups[];
 extern void intel_iommu_debugfs_init(void);
 extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

This patch adds Intel VT-d specific function to implement
iommu_do_invalidate API.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h | 11 +++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d5b939..0b098ad 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
 }
 
+static int intel_iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	int ret = 0;
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_invalidate_data *inv_data;
+	struct qi_desc *qi;
+	u16 did;
+	u8 bus, devfn;
+
+	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
+		return -EINVAL;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
+
+	/* check SID */
+	if (PCI_DEVID(bus, devfn) != inv_data->sid)
+		return 0;
+
+	qi = &inv_data->inv_desc;
+
+	switch (qi->low & QI_TYPE_MASK) {
+	case QI_DIOTLB_TYPE:
+	case QI_DEIOTLB_TYPE:
+		/* for device IOTLB, we just let it pass through */
+		break;
+	default:
+		did = dmar_domain->iommu_did[iommu->seq_id];
+		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
+		break;
+	}
+
+	ret = qi_submit_sync(qi, iommu);
+
+	return ret;
+}
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot)
@@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.bind_pasid_table	= intel_iommu_bind_pasid_table,
 	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+	.do_invalidate		= intel_iommu_do_invalidate,
 #endif
 	.map		= intel_iommu_map,
 	.unmap		= intel_iommu_unmap,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ac04f28..9d6562c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -29,6 +29,7 @@
 #include <linux/dma_remapping.h>
 #include <linux/mmu_notifier.h>
 #include <linux/list.h>
+#include <linux/bitops.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
 
@@ -271,6 +272,10 @@ enum {
 #define QI_PGRP_RESP_TYPE	0x9
 #define QI_PSTRM_RESP_TYPE	0xa
 
+#define QI_DID(did)		(((u64)did & 0xffff) << 16)
+#define QI_DID_MASK		GENMASK(31, 16)
+#define QI_TYPE_MASK		GENMASK(3, 0)
+
 #define QI_IEC_SELECTIVE	(((u64)1) << 4)
 #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
 #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
@@ -529,6 +534,12 @@ struct intel_svm {
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
 #endif
 
+struct intel_invalidate_data {
+	u16 sid;
+	u32 pasid;
+	struct qi_desc inv_desc;
+};
+
 extern const struct attribute_group *intel_iommu_groups[];
 extern void intel_iommu_debugfs_init(void);
 extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
binding requests.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/uapi/linux/vfio.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..6b97987 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+	__u32	argsz;
+#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
+	__u32	flags;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
+				VFIO_SVM_BIND_PASID | \
+				VFIO_SVM_BIND_PGTABLE)
+
+#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
binding requests.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/uapi/linux/vfio.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..6b97987 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+	__u32	argsz;
+#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
+	__u32	flags;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
+				VFIO_SVM_BIND_PASID | \
+				VFIO_SVM_BIND_PGTABLE)
+
+#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 6/8] VFIO: do pasid table binding
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds IOCTL processing in vfio_iommu_type1 for
VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
calling iommu_ops->bind_pasid_table to link the whole
PASID table to pIOMMU.

For VT-d, it is linking the guest PASID table to host pIOMMU.
This is key point to support SVM virtualization on VT-d.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b3cc33f..30b6d48 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+struct vfio_svm_task {
+	struct iommu_domain *domain;
+	void *payload;
+};
+
+static int bind_pasid_tbl_fn(struct device *dev, void *data)
+{
+	int ret = 0;
+	struct vfio_svm_task *task = data;
+	struct pasid_table_info *pasidt_binfo;
+
+	pasidt_binfo = task->payload;
+	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
+	return ret;
+}
+
+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
+				int (*fn)(struct device *, void *))
+{
+	int ret = 0;
+	struct vfio_domain *d;
+	struct vfio_group *g;
+	struct vfio_svm_task task;
+
+	task.payload = data;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next) {
+			if (g->iommu_group != NULL) {
+				task.domain = d->domain;
+				ret = iommu_group_for_each_dev(
+					g->iommu_group, &task, fn);
+				if (ret != 0)
+					break;
+			}
+		}
+	}
+
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
+		struct vfio_device_svm hdr;
+		u8 *data = NULL;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_device_svm, length);
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.length == 0)
+			return -EINVAL;
+
+		data = memdup_user((void __user *)(arg + minsz),
+					hdr.length);
+		if (IS_ERR(data))
+			return PTR_ERR(data);
+
+		switch (hdr.flags & VFIO_SVM_TYPE_MASK) {
+		case VFIO_SVM_BIND_PASIDTBL:
+			ret = vfio_do_svm_task(iommu, data,
+						bind_pasid_tbl_fn);
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+		kfree(data);
+		return ret;
 	}
 
 	return -ENOTTY;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds IOCTL processing in vfio_iommu_type1 for
VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
calling iommu_ops->bind_pasid_table to link the whole
PASID table to pIOMMU.

For VT-d, it is linking the guest PASID table to host pIOMMU.
This is key point to support SVM virtualization on VT-d.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b3cc33f..30b6d48 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+struct vfio_svm_task {
+	struct iommu_domain *domain;
+	void *payload;
+};
+
+static int bind_pasid_tbl_fn(struct device *dev, void *data)
+{
+	int ret = 0;
+	struct vfio_svm_task *task = data;
+	struct pasid_table_info *pasidt_binfo;
+
+	pasidt_binfo = task->payload;
+	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
+	return ret;
+}
+
+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
+				int (*fn)(struct device *, void *))
+{
+	int ret = 0;
+	struct vfio_domain *d;
+	struct vfio_group *g;
+	struct vfio_svm_task task;
+
+	task.payload = data;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next) {
+			if (g->iommu_group != NULL) {
+				task.domain = d->domain;
+				ret = iommu_group_for_each_dev(
+					g->iommu_group, &task, fn);
+				if (ret != 0)
+					break;
+			}
+		}
+	}
+
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
+		struct vfio_device_svm hdr;
+		u8 *data = NULL;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_device_svm, length);
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (hdr.length == 0)
+			return -EINVAL;
+
+		data = memdup_user((void __user *)(arg + minsz),
+					hdr.length);
+		if (IS_ERR(data))
+			return PTR_ERR(data);
+
+		switch (hdr.flags & VFIO_SVM_TYPE_MASK) {
+		case VFIO_SVM_BIND_PASIDTBL:
+			ret = vfio_do_svm_task(iommu, data,
+						bind_pasid_tbl_fn);
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+		kfree(data);
+		return ret;
 	}
 
 	return -ENOTTY;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
invalidate request from guest to host.

In the case of SVM virtualization on VT-d, host IOMMU driver has
no knowledge of caching structure updates unless the guest
invalidation activities are passed down to the host. So a new
IOCTL is needed to propagate the guest cache invalidation through
VFIO.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/uapi/linux/vfio.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6b97987..50c51f8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -564,6 +564,15 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU TLB Invalidation Propagation */
+struct vfio_iommu_tlb_invalidate {
+	__u32	argsz;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
invalidate request from guest to host.

In the case of SVM virtualization on VT-d, host IOMMU driver has
no knowledge of caching structure updates unless the guest
invalidation activities are passed down to the host. So a new
IOCTL is needed to propagate the guest cache invalidation through
VFIO.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/uapi/linux/vfio.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6b97987..50c51f8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -564,6 +564,15 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU TLB Invalidation Propagation */
+struct vfio_iommu_tlb_invalidate {
+	__u32	argsz;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:12   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds support for VFIO_IOMMU_TLB_INVALIDATE cmd in
vfio_iommu_type1.

For SVM virtualization on VT-d, for VFIO_IOMMU_TLB_INVALIDATE, it
calls iommu_ops->do_invalidate() to submit the guest iommu cache
invalidation to pIOMMU.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 30b6d48..6cebdfd 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1528,6 +1528,17 @@ static int bind_pasid_tbl_fn(struct device *dev, void *data)
 	return ret;
 }
 
+static int do_tlb_inv_fn(struct device *dev, void *data)
+{
+	int ret = 0;
+	struct vfio_svm_task *task = data;
+	struct tlb_invalidate_info *inv_info;
+
+	inv_info = task->payload;
+	ret = iommu_do_invalidate(task->domain, dev, inv_info);
+	return ret;
+}
+
 static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
 				int (*fn)(struct device *, void *))
 {
@@ -1654,6 +1665,21 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 		kfree(data);
 		return ret;
+	} else if (cmd == VFIO_IOMMU_TLB_INVALIDATE) {
+		struct vfio_iommu_tlb_invalidate hdr;
+		u8 *data = NULL;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_iommu_tlb_invalidate, length);
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+		if (hdr.length == 0)
+			return -EINVAL;
+		data = memdup_user((void __user *)(arg + minsz),
+				hdr.length);
+		ret = vfio_do_svm_task(iommu, data, do_tlb_inv_fn);
+		kfree(data);
+		return ret;
 	}
 
 	return -ENOTTY;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest
@ 2017-04-26 10:12   ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:12 UTC (permalink / raw)
  To: kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

This patch adds support for VFIO_IOMMU_TLB_INVALIDATE cmd in
vfio_iommu_type1.

For SVM virtualization on VT-d, for VFIO_IOMMU_TLB_INVALIDATE, it
calls iommu_ops->do_invalidate() to submit the guest iommu cache
invalidation to pIOMMU.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 30b6d48..6cebdfd 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1528,6 +1528,17 @@ static int bind_pasid_tbl_fn(struct device *dev, void *data)
 	return ret;
 }
 
+static int do_tlb_inv_fn(struct device *dev, void *data)
+{
+	int ret = 0;
+	struct vfio_svm_task *task = data;
+	struct tlb_invalidate_info *inv_info;
+
+	inv_info = task->payload;
+	ret = iommu_do_invalidate(task->domain, dev, inv_info);
+	return ret;
+}
+
 static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
 				int (*fn)(struct device *, void *))
 {
@@ -1654,6 +1665,21 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 		kfree(data);
 		return ret;
+	} else if (cmd == VFIO_IOMMU_TLB_INVALIDATE) {
+		struct vfio_iommu_tlb_invalidate hdr;
+		u8 *data = NULL;
+		int ret = 0;
+
+		minsz = offsetofend(struct vfio_iommu_tlb_invalidate, length);
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+		if (hdr.length == 0)
+			return -EINVAL;
+		data = memdup_user((void __user *)(arg + minsz),
+				hdr.length);
+		ret = vfio_do_svm_task(iommu, data, do_tlb_inv_fn);
+		kfree(data);
+		return ret;
 	}
 
 	return -ENOTTY;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 10:11   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 16:56     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 16:56 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Jacob Pan, Liu, Yi L

Hi Yi, Jacob,

On 26/04/17 11:11, Liu, Yi L wrote:
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> case in the guest:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> 
> As part of the proposed architecture, when a SVM capable PCI
> device is assigned to a guest, nested mode is turned on. Guest owns the
> first level page tables (request with PASID) and performs GVA->GPA
> translation. Second level page tables are owned by the host for GPA->HPA
> translation for both request with and without PASID.
> 
> A new IOMMU driver interface is therefore needed to perform tasks as
> follows:
> * Enable nested translation and appropriate translation type
> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> 
> This patch introduces new functions called iommu_(un)bind_pasid_table()
> to IOMMU APIs. Architecture specific IOMMU function can be added later
> to perform the specific steps for binding pasid table of assigned devices.
> 
> This patch also adds model definition in iommu.h. It would be used to
> check if the bind request is from a compatible entity. e.g. a bind
> request from an intel_iommu emulator may not be supported by an ARM SMMU
> driver.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index dbe7f65..f2da636 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)

I guess that domain can always be deduced from dev using
iommu_get_domain_for_dev, and doesn't need to be passed as argument?

For the next version of my SVM series, I was thinking of passing group
instead of device to iommu_bind. Since all devices in a group are expected
to share the same mappings (whether they want it or not), users will have
to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
might be simpler to let the IOMMU core take the group lock and do
group->domain->ops->bind_task(dev...) for each device. The question also
holds for iommu_do_invalidate in patch 3/8.

This way the prototypes would be:
int iommu_bind...(struct iommu_group *group, struct ... *info)
int iommu_unbind...(struct iommu_group *group, struct ...*info)
int iommu_invalidate...(struct iommu_group *group, struct ...*info)

For PASID table binding it might not matter much, as VFIO will most likely
be the only user. But task binding will be called by device drivers, which
by now should be encouraged to do things at iommu_group granularity.
Alternatively it could be done implicitly like in iommu_attach_device,
with "iommu_bind_device_x" calling "iommu_bind_group_x".


Extending this reasoning, since groups in a domain are also supposed to
have the same mappings, then similarly to map/unmap,
bind/unbind/invalidate should really be done with an iommu_domain (and
nothing else) as target argument. However this requires the IOMMU core to
keep a group list in each domain, which might complicate things a little
too much.

But "all devices in a domain share the same PASID table" is the paradigm
I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
iommu_group, it should be made more explicit to users, so they don't
assume that devices within a domain are isolated from each others with
regard to PASID DMA.

> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	if (unlikely(!domain->ops->unbind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->unbind_pasid_table(domain, dev);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..491a011 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>  	int			prot;
>  };
>  
> +struct pasid_table_info {
> +	__u64	ptr;	/* PASID table ptr */
> +	__u64	size;	/* PASID table size*/
> +	__u32	model;	/* magic number */
> +#define INTEL_IOMMU	(1 << 0)
> +#define ARM_SMMU	(1 << 1)

Not sure if there is any advantage in this being a bitfield rather than
simple values (1, 2, 3, etc).
The names should also have a prefix, such as "PASID_TABLE_MODEL_"

Thanks a lot for doing this,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-26 16:56     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 16:56 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Jacob Pan, Liu, Yi L

Hi Yi, Jacob,

On 26/04/17 11:11, Liu, Yi L wrote:
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> case in the guest:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> 
> As part of the proposed architecture, when a SVM capable PCI
> device is assigned to a guest, nested mode is turned on. Guest owns the
> first level page tables (request with PASID) and performs GVA->GPA
> translation. Second level page tables are owned by the host for GPA->HPA
> translation for both request with and without PASID.
> 
> A new IOMMU driver interface is therefore needed to perform tasks as
> follows:
> * Enable nested translation and appropriate translation type
> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> 
> This patch introduces new functions called iommu_(un)bind_pasid_table()
> to IOMMU APIs. Architecture specific IOMMU function can be added later
> to perform the specific steps for binding pasid table of assigned devices.
> 
> This patch also adds model definition in iommu.h. It would be used to
> check if the bind request is from a compatible entity. e.g. a bind
> request from an intel_iommu emulator may not be supported by an ARM SMMU
> driver.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index dbe7f65..f2da636 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)

I guess that domain can always be deduced from dev using
iommu_get_domain_for_dev, and doesn't need to be passed as argument?

For the next version of my SVM series, I was thinking of passing group
instead of device to iommu_bind. Since all devices in a group are expected
to share the same mappings (whether they want it or not), users will have
to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
might be simpler to let the IOMMU core take the group lock and do
group->domain->ops->bind_task(dev...) for each device. The question also
holds for iommu_do_invalidate in patch 3/8.

This way the prototypes would be:
int iommu_bind...(struct iommu_group *group, struct ... *info)
int iommu_unbind...(struct iommu_group *group, struct ...*info)
int iommu_invalidate...(struct iommu_group *group, struct ...*info)

For PASID table binding it might not matter much, as VFIO will most likely
be the only user. But task binding will be called by device drivers, which
by now should be encouraged to do things at iommu_group granularity.
Alternatively it could be done implicitly like in iommu_attach_device,
with "iommu_bind_device_x" calling "iommu_bind_group_x".


Extending this reasoning, since groups in a domain are also supposed to
have the same mappings, then similarly to map/unmap,
bind/unbind/invalidate should really be done with an iommu_domain (and
nothing else) as target argument. However this requires the IOMMU core to
keep a group list in each domain, which might complicate things a little
too much.

But "all devices in a domain share the same PASID table" is the paradigm
I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
iommu_group, it should be made more explicit to users, so they don't
assume that devices within a domain are isolated from each others with
regard to PASID DMA.

> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	if (unlikely(!domain->ops->unbind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->unbind_pasid_table(domain, dev);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..491a011 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>  	int			prot;
>  };
>  
> +struct pasid_table_info {
> +	__u64	ptr;	/* PASID table ptr */
> +	__u64	size;	/* PASID table size*/
> +	__u32	model;	/* magic number */
> +#define INTEL_IOMMU	(1 << 0)
> +#define ARM_SMMU	(1 << 1)

Not sure if there is any advantage in this being a bitfield rather than
simple values (1, 2, 3, etc).
The names should also have a prefix, such as "PASID_TABLE_MODEL_"

Thanks a lot for doing this,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 16:56     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 16:56 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Liu, Yi L

On 26/04/17 11:12, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)

This could be called "VFIO_IOMMU_SVM_BIND, since it will be used both to
bind tables and individual tasks.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-04-26 16:56     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 16:56 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Liu, Yi L

On 26/04/17 11:12, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)

This could be called "VFIO_IOMMU_SVM_BIND, since it will be used both to
bind tables and individual tasks.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-04-26 18:29         ` jacob pan
  -1 siblings, 0 replies; 116+ messages in thread
From: jacob pan @ 2017-04-26 18:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 17:56:45 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
> > use case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns
> > the first level page tables (request with PASID) and performs
> > GVA->GPA translation. Second level page tables are owned by the
> > host for GPA->HPA translation for both request with and without
> > PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called
> > iommu_(un)bind_pasid_table() to IOMMU APIs. Architecture specific
> > IOMMU function can be added later to perform the specific steps for
> > binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used
> > to check if the bind request is from a compatible entity. e.g. a
> > bind request from an intel_iommu emulator may not be supported by
> > an ARM SMMU driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain
> > *domain, struct device *dev) }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct
> > device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
true. device should have attached a domain before calling this API.
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are
> expected to share the same mappings (whether they want it or not),
> users will have to do iommu_group_for_each_dev anyway (as you do in
> patch 6/8). So it might be simpler to let the IOMMU core take the
> group lock and do group->domain->ops->bind_task(dev...) for each
> device. The question also holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
Just to understand this granularity implication of fault notification
(e.g. page request) of this change. PRI for all devices in the group
will be enabled. IOMMU driver receives page request per device with the
same PASID bond to the group. There can be two scenarios:
1. If iommu_bind() to a task, IOMMU driver handles page fault
internally per device, there is no need to do group level, true?
2. If the device iommu_bind_pasid_table() is called, then we propagate
PRQ to VFIO per device.


> For PASID table binding it might not matter much, as VFIO will most
> likely be the only user. But task binding will be called by device
> drivers, which by now should be encouraged to do things at
> iommu_group granularity. Alternatively it could be done implicitly
> like in iommu_attach_device, with "iommu_bind_device_x" calling
> "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed
> to have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU
> core to keep a group list in each domain, which might complicate
> things a little too much.
> 
> But "all devices in a domain share the same PASID table" is the
> paradigm I'm currently using in the guts of arm-smmu-v3. And I wonder
> if, as with iommu_group, it should be made more explicit to users, so
> they don't assume that devices within a domain are isolated from each
> others with regard to PASID DMA.
> 
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev,
> > pasidt_binfo); +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct
> > device *dev) +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> 
> Not sure if there is any advantage in this being a bitfield rather
> than simple values (1, 2, 3, etc).
> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> 
> Thanks a lot for doing this,
> Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-26 18:29         ` jacob pan
  0 siblings, 0 replies; 116+ messages in thread
From: jacob pan @ 2017-04-26 18:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, tianyu.lan, Jacob Pan, Liu,
	Yi L, jacob.jun.pan

On Wed, 26 Apr 2017 17:56:45 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
> > use case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns
> > the first level page tables (request with PASID) and performs
> > GVA->GPA translation. Second level page tables are owned by the
> > host for GPA->HPA translation for both request with and without
> > PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called
> > iommu_(un)bind_pasid_table() to IOMMU APIs. Architecture specific
> > IOMMU function can be added later to perform the specific steps for
> > binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used
> > to check if the bind request is from a compatible entity. e.g. a
> > bind request from an intel_iommu emulator may not be supported by
> > an ARM SMMU driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain
> > *domain, struct device *dev) }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct
> > device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
true. device should have attached a domain before calling this API.
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are
> expected to share the same mappings (whether they want it or not),
> users will have to do iommu_group_for_each_dev anyway (as you do in
> patch 6/8). So it might be simpler to let the IOMMU core take the
> group lock and do group->domain->ops->bind_task(dev...) for each
> device. The question also holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
Just to understand this granularity implication of fault notification
(e.g. page request) of this change. PRI for all devices in the group
will be enabled. IOMMU driver receives page request per device with the
same PASID bond to the group. There can be two scenarios:
1. If iommu_bind() to a task, IOMMU driver handles page fault
internally per device, there is no need to do group level, true?
2. If the device iommu_bind_pasid_table() is called, then we propagate
PRQ to VFIO per device.


> For PASID table binding it might not matter much, as VFIO will most
> likely be the only user. But task binding will be called by device
> drivers, which by now should be encouraged to do things at
> iommu_group granularity. Alternatively it could be done implicitly
> like in iommu_attach_device, with "iommu_bind_device_x" calling
> "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed
> to have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU
> core to keep a group list in each domain, which might complicate
> things a little too much.
> 
> But "all devices in a domain share the same PASID table" is the
> paradigm I'm currently using in the guts of arm-smmu-v3. And I wonder
> if, as with iommu_group, it should be made more explicit to users, so
> they don't assume that devices within a domain are isolated from each
> others with regard to PASID DMA.
> 
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev,
> > pasidt_binfo); +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct
> > device *dev) +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> 
> Not sure if there is any advantage in this being a bitfield rather
> than simple values (1, 2, 3, etc).
> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> 
> Thanks a lot for doing this,
> Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 18:29         ` [Qemu-devel] " jacob pan
@ 2017-04-26 18:59             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 18:59 UTC (permalink / raw)
  To: jacob pan
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 26/04/17 19:29, jacob pan wrote:
> On Wed, 26 Apr 2017 17:56:45 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
>>> use case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns
>>> the first level page tables (request with PASID) and performs
>>> GVA->GPA translation. Second level page tables are owned by the
>>> host for GPA->HPA translation for both request with and without
>>> PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called
>>> iommu_(un)bind_pasid_table() to IOMMU APIs. Architecture specific
>>> IOMMU function can be added later to perform the specific steps for
>>> binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used
>>> to check if the bind request is from a compatible entity. e.g. a
>>> bind request from an intel_iommu emulator may not be supported by
>>> an ARM SMMU driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain
>>> *domain, struct device *dev) }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct
>>> device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
> true. device should have attached a domain before calling this API.
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are
>> expected to share the same mappings (whether they want it or not),
>> users will have to do iommu_group_for_each_dev anyway (as you do in
>> patch 6/8). So it might be simpler to let the IOMMU core take the
>> group lock and do group->domain->ops->bind_task(dev...) for each
>> device. The question also holds for iommu_do_invalidate in patch 3/8.
>>
>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>
> Just to understand this granularity implication of fault notification
> (e.g. page request) of this change. PRI for all devices in the group
> will be enabled. IOMMU driver receives page request per device with the
> same PASID bond to the group. There can be two scenarios:
> 1. If iommu_bind() to a task, IOMMU driver handles page fault
> internally per device, there is no need to do group level, true?

Yes, we find the task corresponding to the PASID, and call handle_mm_fault
on it.

> 2. If the device iommu_bind_pasid_table() is called, then we propagate
> PRQ to VFIO per device.

I guess yes. Although it could be reported on the container, but the guest
IOMMU driver probably wants to know which device triggered the fault anyway.

The implication of having a group granularity instead of device is that
after the fault is handled, all other devices in the group are also able
to access the region that was just mapped.

If I understand correctly, unlike putting multiple groups in a domain,
putting multiple devices in a group is generally not a software choice.
Usually with PCIe there is a single device per group, but in some cases
(lack of ACS isolation, legacy PCI, bugs), functions cannot be
distinguished by the IOMMU, or can snoop each other's DMA. If this is the
case they need to be put in the same group by the bus driver.

Thanks,
Jean

>> For PASID table binding it might not matter much, as VFIO will most
>> likely be the only user. But task binding will be called by device
>> drivers, which by now should be encouraged to do things at
>> iommu_group granularity. Alternatively it could be done implicitly
>> like in iommu_attach_device, with "iommu_bind_device_x" calling
>> "iommu_bind_group_x".
>>
>>
>> Extending this reasoning, since groups in a domain are also supposed
>> to have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU
>> core to keep a group list in each domain, which might complicate
>> things a little too much.
>>
>> But "all devices in a domain share the same PASID table" is the
>> paradigm I'm currently using in the guts of arm-smmu-v3. And I wonder
>> if, as with iommu_group, it should be made more explicit to users, so
>> they don't assume that devices within a domain are isolated from each
>> others with regard to PASID DMA.
>>
>>> +{
>>> +	if (unlikely(!domain->ops->bind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->bind_pasid_table(domain, dev,
>>> pasidt_binfo); +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
>>> +
>>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct
>>> device *dev) +{
>>> +	if (unlikely(!domain->ops->unbind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->unbind_pasid_table(domain, dev);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>>> +
>>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>>  				  struct device *dev)
>>>  {
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 0ff5111..491a011 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>>>  	int			prot;
>>>  };
>>>  
>>> +struct pasid_table_info {
>>> +	__u64	ptr;	/* PASID table ptr */
>>> +	__u64	size;	/* PASID table size*/
>>> +	__u32	model;	/* magic number */
>>> +#define INTEL_IOMMU	(1 << 0)
>>> +#define ARM_SMMU	(1 << 1)
>>
>> Not sure if there is any advantage in this being a bitfield rather
>> than simple values (1, 2, 3, etc).
>> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
>>
>> Thanks a lot for doing this,
>> Jean
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-26 18:59             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-26 18:59 UTC (permalink / raw)
  To: jacob pan
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, tianyu.lan, Jacob Pan, Liu,
	Yi L

On 26/04/17 19:29, jacob pan wrote:
> On Wed, 26 Apr 2017 17:56:45 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
>>> use case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns
>>> the first level page tables (request with PASID) and performs
>>> GVA->GPA translation. Second level page tables are owned by the
>>> host for GPA->HPA translation for both request with and without
>>> PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called
>>> iommu_(un)bind_pasid_table() to IOMMU APIs. Architecture specific
>>> IOMMU function can be added later to perform the specific steps for
>>> binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used
>>> to check if the bind request is from a compatible entity. e.g. a
>>> bind request from an intel_iommu emulator may not be supported by
>>> an ARM SMMU driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain
>>> *domain, struct device *dev) }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct
>>> device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
> true. device should have attached a domain before calling this API.
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are
>> expected to share the same mappings (whether they want it or not),
>> users will have to do iommu_group_for_each_dev anyway (as you do in
>> patch 6/8). So it might be simpler to let the IOMMU core take the
>> group lock and do group->domain->ops->bind_task(dev...) for each
>> device. The question also holds for iommu_do_invalidate in patch 3/8.
>>
>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>
> Just to understand this granularity implication of fault notification
> (e.g. page request) of this change. PRI for all devices in the group
> will be enabled. IOMMU driver receives page request per device with the
> same PASID bond to the group. There can be two scenarios:
> 1. If iommu_bind() to a task, IOMMU driver handles page fault
> internally per device, there is no need to do group level, true?

Yes, we find the task corresponding to the PASID, and call handle_mm_fault
on it.

> 2. If the device iommu_bind_pasid_table() is called, then we propagate
> PRQ to VFIO per device.

I guess yes. Although it could be reported on the container, but the guest
IOMMU driver probably wants to know which device triggered the fault anyway.

The implication of having a group granularity instead of device is that
after the fault is handled, all other devices in the group are also able
to access the region that was just mapped.

If I understand correctly, unlike putting multiple groups in a domain,
putting multiple devices in a group is generally not a software choice.
Usually with PCIe there is a single device per group, but in some cases
(lack of ACS isolation, legacy PCI, bugs), functions cannot be
distinguished by the IOMMU, or can snoop each other's DMA. If this is the
case they need to be put in the same group by the bus driver.

Thanks,
Jean

>> For PASID table binding it might not matter much, as VFIO will most
>> likely be the only user. But task binding will be called by device
>> drivers, which by now should be encouraged to do things at
>> iommu_group granularity. Alternatively it could be done implicitly
>> like in iommu_attach_device, with "iommu_bind_device_x" calling
>> "iommu_bind_group_x".
>>
>>
>> Extending this reasoning, since groups in a domain are also supposed
>> to have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU
>> core to keep a group list in each domain, which might complicate
>> things a little too much.
>>
>> But "all devices in a domain share the same PASID table" is the
>> paradigm I'm currently using in the guts of arm-smmu-v3. And I wonder
>> if, as with iommu_group, it should be made more explicit to users, so
>> they don't assume that devices within a domain are isolated from each
>> others with regard to PASID DMA.
>>
>>> +{
>>> +	if (unlikely(!domain->ops->bind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->bind_pasid_table(domain, dev,
>>> pasidt_binfo); +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
>>> +
>>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct
>>> device *dev) +{
>>> +	if (unlikely(!domain->ops->unbind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->unbind_pasid_table(domain, dev);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>>> +
>>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>>  				  struct device *dev)
>>>  {
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 0ff5111..491a011 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>>>  	int			prot;
>>>  };
>>>  
>>> +struct pasid_table_info {
>>> +	__u64	ptr;	/* PASID table ptr */
>>> +	__u64	size;	/* PASID table size*/
>>> +	__u32	model;	/* magic number */
>>> +#define INTEL_IOMMU	(1 << 0)
>>> +#define ARM_SMMU	(1 << 1)
>>
>> Not sure if there is any advantage in this being a bitfield rather
>> than simple values (1, 2, 3, etc).
>> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
>>
>> Thanks a lot for doing this,
>> Jean
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
  (?)
@ 2017-04-27  5:43     ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-27  5:43 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, ashok.raj, jasowang, qemu-devel, jacob.jun.pan

On Wed, Apr 26, 2017 at 05:56:50PM +0100, Jean-Philippe Brucker wrote:
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +	__u32	argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> > +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > +	__u32	flags;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> > +				VFIO_SVM_BIND_PASID | \
> > +				VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> 
> This could be called "VFIO_IOMMU_SVM_BIND, since it will be used both to
> bind tables and individual tasks.

yes, it is. would modify it in next version.

Thanks,
Yi L 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-04-27  6:36       ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-27  6:36 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan,
	Jacob Pan

On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have
> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.
> 
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> 
> Not sure if there is any advantage in this being a bitfield rather than
> simple values (1, 2, 3, etc).
> The names should also have a prefix, such as "PASID_TABLE_MODEL_"

Hi Jean,

For the value, no special reason from my side. so it is fine to just
use (1,2..).
For the prefix, model definition may also needed for iommu tlb
invalidate propagation. So it may be suitable to have a prefix like
"SVM_MODEL_" or so? Does it make sense?

Thanks,
Yi L

> Thanks a lot for doing this,
> Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-27  6:36       ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-27  6:36 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan,
	Jacob Pan

On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have
> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.
> 
> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.
> 
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> 
> Not sure if there is any advantage in this being a bitfield rather than
> simple values (1, 2, 3, etc).
> The names should also have a prefix, such as "PASID_TABLE_MODEL_"

Hi Jean,

For the value, no special reason from my side. so it is fine to just
use (1,2..).
For the prefix, model definition may also needed for iommu tlb
invalidate propagation. So it may be suitable to have a prefix like
"SVM_MODEL_" or so? Does it make sense?

Thanks,
Yi L

> Thanks a lot for doing this,
> Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-27  6:36       ` [Qemu-devel] " Liu, Yi L
@ 2017-04-27 10:12         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-27 10:12 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 27/04/17 07:36, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
>>> case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns the
>>> first level page tables (request with PASID) and performs GVA->GPA
>>> translation. Second level page tables are owned by the host for GPA->HPA
>>> translation for both request with and without PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called iommu_(un)bind_pasid_table()
>>> to IOMMU APIs. Architecture specific IOMMU function can be added later
>>> to perform the specific steps for binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used to
>>> check if the bind request is from a compatible entity. e.g. a bind
>>> request from an intel_iommu emulator may not be supported by an ARM SMMU
>>> driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are expected
>> to share the same mappings (whether they want it or not), users will have
>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>> might be simpler to let the IOMMU core take the group lock and do
>> group->domain->ops->bind_task(dev...) for each device. The question also
>> holds for iommu_do_invalidate in patch 3/8.
>>
>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>
>> For PASID table binding it might not matter much, as VFIO will most likely
>> be the only user. But task binding will be called by device drivers, which
>> by now should be encouraged to do things at iommu_group granularity.
>> Alternatively it could be done implicitly like in iommu_attach_device,
>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
>>
>>
>> Extending this reasoning, since groups in a domain are also supposed to
>> have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU core to
>> keep a group list in each domain, which might complicate things a little
>> too much.
>>
>> But "all devices in a domain share the same PASID table" is the paradigm
>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>> iommu_group, it should be made more explicit to users, so they don't
>> assume that devices within a domain are isolated from each others with
>> regard to PASID DMA.
>>
>>> +{
>>> +	if (unlikely(!domain->ops->bind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
>>> +
>>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>>> +{
>>> +	if (unlikely(!domain->ops->unbind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->unbind_pasid_table(domain, dev);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>>> +
>>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>>  				  struct device *dev)
>>>  {
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 0ff5111..491a011 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>>>  	int			prot;
>>>  };
>>>  
>>> +struct pasid_table_info {
>>> +	__u64	ptr;	/* PASID table ptr */
>>> +	__u64	size;	/* PASID table size*/
>>> +	__u32	model;	/* magic number */
>>> +#define INTEL_IOMMU	(1 << 0)
>>> +#define ARM_SMMU	(1 << 1)
>>
>> Not sure if there is any advantage in this being a bitfield rather than
>> simple values (1, 2, 3, etc).
>> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> 
> Hi Jean,
> 
> For the value, no special reason from my side. so it is fine to just
> use (1,2..).
> For the prefix, model definition may also needed for iommu tlb
> invalidate propagation. So it may be suitable to have a prefix like
> "SVM_MODEL_" or so? Does it make sense?

Sure, SVM_MODEL would do. However, once the PASID table is bound and the
model has been negotiated, do we need to repeat the model in each
invalidation? At that point user and driver already agreed on the model to
use (and the various structure formats), so they could just transmit
opaque structures and each end of the pipe would know how to interpret it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-27 10:12         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-27 10:12 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan,
	Jacob Pan

On 27/04/17 07:36, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
>>> case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns the
>>> first level page tables (request with PASID) and performs GVA->GPA
>>> translation. Second level page tables are owned by the host for GPA->HPA
>>> translation for both request with and without PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called iommu_(un)bind_pasid_table()
>>> to IOMMU APIs. Architecture specific IOMMU function can be added later
>>> to perform the specific steps for binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used to
>>> check if the bind request is from a compatible entity. e.g. a bind
>>> request from an intel_iommu emulator may not be supported by an ARM SMMU
>>> driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are expected
>> to share the same mappings (whether they want it or not), users will have
>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>> might be simpler to let the IOMMU core take the group lock and do
>> group->domain->ops->bind_task(dev...) for each device. The question also
>> holds for iommu_do_invalidate in patch 3/8.
>>
>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>
>> For PASID table binding it might not matter much, as VFIO will most likely
>> be the only user. But task binding will be called by device drivers, which
>> by now should be encouraged to do things at iommu_group granularity.
>> Alternatively it could be done implicitly like in iommu_attach_device,
>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
>>
>>
>> Extending this reasoning, since groups in a domain are also supposed to
>> have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU core to
>> keep a group list in each domain, which might complicate things a little
>> too much.
>>
>> But "all devices in a domain share the same PASID table" is the paradigm
>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>> iommu_group, it should be made more explicit to users, so they don't
>> assume that devices within a domain are isolated from each others with
>> regard to PASID DMA.
>>
>>> +{
>>> +	if (unlikely(!domain->ops->bind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
>>> +
>>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>>> +{
>>> +	if (unlikely(!domain->ops->unbind_pasid_table))
>>> +		return -EINVAL;
>>> +
>>> +	return domain->ops->unbind_pasid_table(domain, dev);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>>> +
>>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>>  				  struct device *dev)
>>>  {
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 0ff5111..491a011 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>>>  	int			prot;
>>>  };
>>>  
>>> +struct pasid_table_info {
>>> +	__u64	ptr;	/* PASID table ptr */
>>> +	__u64	size;	/* PASID table size*/
>>> +	__u32	model;	/* magic number */
>>> +#define INTEL_IOMMU	(1 << 0)
>>> +#define ARM_SMMU	(1 << 1)
>>
>> Not sure if there is any advantage in this being a bitfield rather than
>> simple values (1, 2, 3, etc).
>> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> 
> Hi Jean,
> 
> For the value, no special reason from my side. so it is fine to just
> use (1,2..).
> For the prefix, model definition may also needed for iommu tlb
> invalidate propagation. So it may be suitable to have a prefix like
> "SVM_MODEL_" or so? Does it make sense?

Sure, SVM_MODEL would do. However, once the PASID table is bound and the
model has been negotiated, do we need to repeat the model in each
invalidation? At that point user and driver already agreed on the model to
use (and the various structure formats), so they could just transmit
opaque structures and each end of the pipe would know how to interpret it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-27 10:12         ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-04-28  7:59             ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-28  7:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Thu, Apr 27, 2017 at 11:12:45AM +0100, Jean-Philippe Brucker wrote:
> On 27/04/17 07:36, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> >>> +			struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> >>
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> >>
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> >>
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> >> iommu_group, it should be made more explicit to users, so they don't
> >> assume that devices within a domain are isolated from each others with
> >> regard to PASID DMA.
> >>
> >>> +{
> >>> +	if (unlikely(!domain->ops->bind_pasid_table))
> >>> +		return -EINVAL;
> >>> +
> >>> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> >>> +
> >>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >>> +{
> >>> +	if (unlikely(!domain->ops->unbind_pasid_table))
> >>> +		return -EINVAL;
> >>> +
> >>> +	return domain->ops->unbind_pasid_table(domain, dev);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >>> +
> >>>  static void __iommu_detach_device(struct iommu_domain *domain,
> >>>  				  struct device *dev)
> >>>  {
> >>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >>> index 0ff5111..491a011 100644
> >>> --- a/include/linux/iommu.h
> >>> +++ b/include/linux/iommu.h
> >>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >>>  	int			prot;
> >>>  };
> >>>  
> >>> +struct pasid_table_info {
> >>> +	__u64	ptr;	/* PASID table ptr */
> >>> +	__u64	size;	/* PASID table size*/
> >>> +	__u32	model;	/* magic number */
> >>> +#define INTEL_IOMMU	(1 << 0)
> >>> +#define ARM_SMMU	(1 << 1)
> >>
> >> Not sure if there is any advantage in this being a bitfield rather than
> >> simple values (1, 2, 3, etc).
> >> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> > 
> > Hi Jean,
> > 
> > For the value, no special reason from my side. so it is fine to just
> > use (1,2..).
> > For the prefix, model definition may also needed for iommu tlb
> > invalidate propagation. So it may be suitable to have a prefix like
> > "SVM_MODEL_" or so? Does it make sense?
> 
> Sure, SVM_MODEL would do. However, once the PASID table is bound and the
> model has been negotiated, do we need to repeat the model in each
> invalidation? At that point user and driver already agreed on the model to
> use (and the various structure formats), so they could just transmit
> opaque structures and each end of the pipe would know how to interpret it.

Jean,

yes, it works if this invalidate API is only for SVM. But I remember there may
be requirement from gIOVA. If guest tries to invalidate large range of gIOVA
mapping, using unmap is low efficiency. Tianyu Lan may provide detail about
this case. That's why I added the model check all the same in invalidate path.
Let's double check if the requirement is really needed from gIOVA. If no, I
agree with you on the suggested method.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-28  7:59             ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-28  7:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan, kevin.tian, Liu, Yi L, ashok.raj, kvm, jasowang,
	alex.williamson, peterx, qemu-devel, iommu, Jacob Pan,
	jacob.jun.pan

On Thu, Apr 27, 2017 at 11:12:45AM +0100, Jean-Philippe Brucker wrote:
> On 27/04/17 07:36, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> >>> +			struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> >>
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> >>
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> >>
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> >> iommu_group, it should be made more explicit to users, so they don't
> >> assume that devices within a domain are isolated from each others with
> >> regard to PASID DMA.
> >>
> >>> +{
> >>> +	if (unlikely(!domain->ops->bind_pasid_table))
> >>> +		return -EINVAL;
> >>> +
> >>> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> >>> +
> >>> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >>> +{
> >>> +	if (unlikely(!domain->ops->unbind_pasid_table))
> >>> +		return -EINVAL;
> >>> +
> >>> +	return domain->ops->unbind_pasid_table(domain, dev);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >>> +
> >>>  static void __iommu_detach_device(struct iommu_domain *domain,
> >>>  				  struct device *dev)
> >>>  {
> >>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >>> index 0ff5111..491a011 100644
> >>> --- a/include/linux/iommu.h
> >>> +++ b/include/linux/iommu.h
> >>> @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >>>  	int			prot;
> >>>  };
> >>>  
> >>> +struct pasid_table_info {
> >>> +	__u64	ptr;	/* PASID table ptr */
> >>> +	__u64	size;	/* PASID table size*/
> >>> +	__u32	model;	/* magic number */
> >>> +#define INTEL_IOMMU	(1 << 0)
> >>> +#define ARM_SMMU	(1 << 1)
> >>
> >> Not sure if there is any advantage in this being a bitfield rather than
> >> simple values (1, 2, 3, etc).
> >> The names should also have a prefix, such as "PASID_TABLE_MODEL_"
> > 
> > Hi Jean,
> > 
> > For the value, no special reason from my side. so it is fine to just
> > use (1,2..).
> > For the prefix, model definition may also needed for iommu tlb
> > invalidate propagation. So it may be suitable to have a prefix like
> > "SVM_MODEL_" or so? Does it make sense?
> 
> Sure, SVM_MODEL would do. However, once the PASID table is bound and the
> model has been negotiated, do we need to repeat the model in each
> invalidation? At that point user and driver already agreed on the model to
> use (and the various structure formats), so they could just transmit
> opaque structures and each end of the pipe would know how to interpret it.

Jean,

yes, it works if this invalidate API is only for SVM. But I remember there may
be requirement from gIOVA. If guest tries to invalidate large range of gIOVA
mapping, using unmap is low efficiency. Tianyu Lan may provide detail about
this case. That's why I added the model check all the same in invalidate path.
Let's double check if the requirement is really needed from gIOVA. If no, I
agree with you on the suggested method.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-04-28  9:04         ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have

Virtual address space is not tied to protection domain as I/O virtual address
space does. Is it really necessary to affect all the devices in this group.
Or it is just for consistence?

> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.

In my understanding, it is moving the for_each_dev loop into iommu driver?
Is it?

> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)

For PASID table binding from guest, I think it'd better to be per-device op
since the bind operation wants to modify the host context entry. But we may
still share the API and do things differently in iommu driver.

For invalidation, I think it'd better to be per-group. Actually, with guest
IOMMU exists, there is only one group in a domain on Intel platform. Do it for
each device is not expected. How about it on ARM?

> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".

Do you mean the bind task from userspace driver? I guess you're trying to do
different types of binding request in a single svm_bind API?

> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.

Is the isolation you mentioned means forbidding to do PASID DMA to the same
virtual address space when the device comes from different domain?

Thanks,
Yi L
 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-28  9:04         ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, Jacob Pan, ashok.raj, jasowang, qemu-devel,
	jacob.jun.pan

On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi, Jacob,
> 
> On 26/04/17 11:11, Liu, Yi L wrote:
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> 
> I guess that domain can always be deduced from dev using
> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> 
> For the next version of my SVM series, I was thinking of passing group
> instead of device to iommu_bind. Since all devices in a group are expected
> to share the same mappings (whether they want it or not), users will have

Virtual address space is not tied to protection domain as I/O virtual address
space does. Is it really necessary to affect all the devices in this group.
Or it is just for consistence?

> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> might be simpler to let the IOMMU core take the group lock and do
> group->domain->ops->bind_task(dev...) for each device. The question also
> holds for iommu_do_invalidate in patch 3/8.

In my understanding, it is moving the for_each_dev loop into iommu driver?
Is it?

> This way the prototypes would be:
> int iommu_bind...(struct iommu_group *group, struct ... *info)
> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> int iommu_invalidate...(struct iommu_group *group, struct ...*info)

For PASID table binding from guest, I think it'd better to be per-device op
since the bind operation wants to modify the host context entry. But we may
still share the API and do things differently in iommu driver.

For invalidation, I think it'd better to be per-group. Actually, with guest
IOMMU exists, there is only one group in a domain on Intel platform. Do it for
each device is not expected. How about it on ARM?

> For PASID table binding it might not matter much, as VFIO will most likely
> be the only user. But task binding will be called by device drivers, which
> by now should be encouraged to do things at iommu_group granularity.
> Alternatively it could be done implicitly like in iommu_attach_device,
> with "iommu_bind_device_x" calling "iommu_bind_group_x".

Do you mean the bind task from userspace driver? I guess you're trying to do
different types of binding request in a single svm_bind API?

> 
> Extending this reasoning, since groups in a domain are also supposed to
> have the same mappings, then similarly to map/unmap,
> bind/unbind/invalidate should really be done with an iommu_domain (and
> nothing else) as target argument. However this requires the IOMMU core to
> keep a group list in each domain, which might complicate things a little
> too much.
> 
> But "all devices in a domain share the same PASID table" is the paradigm
> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> iommu_group, it should be made more explicit to users, so they don't
> assume that devices within a domain are isolated from each others with
> regard to PASID DMA.

Is the isolation you mentioned means forbidding to do PASID DMA to the same
virtual address space when the device comes from different domain?

Thanks,
Yi L
 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-28  9:04         ` Liu, Yi L
@ 2017-04-28 12:51           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-28 12:51 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 28/04/17 10:04, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
>>> case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns the
>>> first level page tables (request with PASID) and performs GVA->GPA
>>> translation. Second level page tables are owned by the host for GPA->HPA
>>> translation for both request with and without PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called iommu_(un)bind_pasid_table()
>>> to IOMMU APIs. Architecture specific IOMMU function can be added later
>>> to perform the specific steps for binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used to
>>> check if the bind request is from a compatible entity. e.g. a bind
>>> request from an intel_iommu emulator may not be supported by an ARM SMMU
>>> driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are expected
>> to share the same mappings (whether they want it or not), users will have
> 
> Virtual address space is not tied to protection domain as I/O virtual address
> space does. Is it really necessary to affect all the devices in this group.
> Or it is just for consistence?

It's mostly about consistency, and also avoid hiding implicit behavior in
the IOMMU driver. I have the following example, described using group and
domain structures from the IOMMU API:
                 ____________________
                |IOMMU  ____________ |
                |      |DOM  ______ ||
                |      |    |GRP   |||     bind
                |      |    |    A<-----------------Task 1
                |      |    |    B |||
                |      |    |______|||
                |      |     ______ ||
                |      |    |GRP   |||
                |      |    |    C |||
                |      |    |______|||
                |      |____________||
                |       ____________ |
                |      |DOM  ______ ||
                |      |    |GRP   |||
                |      |    |    D |||
                |      |    |______|||
                |      |____________||
                |____________________|

Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
to some hardware limitation (in the bus, the device or the IOMMU), B can
see all DMA transactions issued by A. A and B are therefore in the same
IOMMU group. C and D can be isolated by the IOMMU, so they each have their
own group.

(As far as I know, in the SVM world at the moment, devices are neatly
integrated and there is no need for putting multiple devices in the same
IOMMU group, but I don't think we should expect all future SVM systems to
be well-behaved.)

So when a user binds Task 1 to device A, it is *implicitly* giving device
B access to Task 1 as well. Simply because the IOMMU is unable to isolate
A from B, PASID or not. B could access the same address space as A, even
if you don't call bind again to explicitly attach the PASID table to B.

If the bind is done with device as argument, maybe users will believe that
using PASIDs provides an additional level of isolation within a group,
when it really doesn't. That's why I'm inclined to have the whole bind API
be on groups rather than devices, if only for clarity.

But I don't know, maybe a comment explaining the above would be sufficient.

To be frank my comment about group versus device is partly to make sure
that I grasp the various concepts correctly and that we're on the same
page. Doing the bind on groups is less significant in your case, for PASID
table binding, because VFIO already takes care of IOMMU group properly. In
my case I expect DRM, network, DMA drivers to use the API as well for
binding tasks, and I don't want to introduce ambiguity in the API that
would lead to security holes later.

>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>> might be simpler to let the IOMMU core take the group lock and do
>> group->domain->ops->bind_task(dev...) for each device. The question also
>> holds for iommu_do_invalidate in patch 3/8.
> 
> In my understanding, it is moving the for_each_dev loop into iommu driver?
> Is it?

Yes, that's what I meant

>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding from guest, I think it'd better to be per-device op
> since the bind operation wants to modify the host context entry. But we may
> still share the API and do things differently in iommu driver.

Sure, as said above the use cases for PASID table and single PASID binding
are different, sharing the API is not strictly necessary.

> For invalidation, I think it'd better to be per-group. Actually, with guest
> IOMMU exists, there is only one group in a domain on Intel platform. Do it for
> each device is not expected. How about it on ARM?

In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
multiple groups in the same container to the same domain when possible.

>> For PASID table binding it might not matter much, as VFIO will most likely
>> be the only user. But task binding will be called by device drivers, which
>> by now should be encouraged to do things at iommu_group granularity.
>> Alternatively it could be done implicitly like in iommu_attach_device,
>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> Do you mean the bind task from userspace driver? I guess you're trying to do
> different types of binding request in a single svm_bind API?
> 
>>
>> Extending this reasoning, since groups in a domain are also supposed to
>> have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU core to
>> keep a group list in each domain, which might complicate things a little
>> too much.
>>
>> But "all devices in a domain share the same PASID table" is the paradigm
>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>> iommu_group, it should be made more explicit to users, so they don't
>> assume that devices within a domain are isolated from each others with
>> regard to PASID DMA.
> 
> Is the isolation you mentioned means forbidding to do PASID DMA to the same
> virtual address space when the device comes from different domain?

In the above example, devices A, B and C are in the same IOMMU domain
(because, for instance, user put the two groups in the same VFIO
container.) Then in the SMMUv3 driver they would all share the same PASID
table. A, B and C can access Task 1 with the PASID obtained during the
depicted bind. They don't need to call bind again for device C, though it
would be good practice.

But D is in a different domain, so unless you also call bind on Task 1 for
device D, there is no way that D can access Task 1.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-04-28 12:51           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-04-28 12:51 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, Jacob Pan, ashok.raj, jasowang, qemu-devel,
	jacob.jun.pan

On 28/04/17 10:04, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi, Jacob,
>>
>> On 26/04/17 11:11, Liu, Yi L wrote:
>>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>>
>>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
>>> case in the guest:
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
>>>
>>> As part of the proposed architecture, when a SVM capable PCI
>>> device is assigned to a guest, nested mode is turned on. Guest owns the
>>> first level page tables (request with PASID) and performs GVA->GPA
>>> translation. Second level page tables are owned by the host for GPA->HPA
>>> translation for both request with and without PASID.
>>>
>>> A new IOMMU driver interface is therefore needed to perform tasks as
>>> follows:
>>> * Enable nested translation and appropriate translation type
>>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
>>>
>>> This patch introduces new functions called iommu_(un)bind_pasid_table()
>>> to IOMMU APIs. Architecture specific IOMMU function can be added later
>>> to perform the specific steps for binding pasid table of assigned devices.
>>>
>>> This patch also adds model definition in iommu.h. It would be used to
>>> check if the bind request is from a compatible entity. e.g. a bind
>>> request from an intel_iommu emulator may not be supported by an ARM SMMU
>>> driver.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>>>  2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dbe7f65..f2da636 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>>  
>>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
>>> +			struct pasid_table_info *pasidt_binfo)
>>
>> I guess that domain can always be deduced from dev using
>> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
>>
>> For the next version of my SVM series, I was thinking of passing group
>> instead of device to iommu_bind. Since all devices in a group are expected
>> to share the same mappings (whether they want it or not), users will have
> 
> Virtual address space is not tied to protection domain as I/O virtual address
> space does. Is it really necessary to affect all the devices in this group.
> Or it is just for consistence?

It's mostly about consistency, and also avoid hiding implicit behavior in
the IOMMU driver. I have the following example, described using group and
domain structures from the IOMMU API:
                 ____________________
                |IOMMU  ____________ |
                |      |DOM  ______ ||
                |      |    |GRP   |||     bind
                |      |    |    A<-----------------Task 1
                |      |    |    B |||
                |      |    |______|||
                |      |     ______ ||
                |      |    |GRP   |||
                |      |    |    C |||
                |      |    |______|||
                |      |____________||
                |       ____________ |
                |      |DOM  ______ ||
                |      |    |GRP   |||
                |      |    |    D |||
                |      |    |______|||
                |      |____________||
                |____________________|

Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
to some hardware limitation (in the bus, the device or the IOMMU), B can
see all DMA transactions issued by A. A and B are therefore in the same
IOMMU group. C and D can be isolated by the IOMMU, so they each have their
own group.

(As far as I know, in the SVM world at the moment, devices are neatly
integrated and there is no need for putting multiple devices in the same
IOMMU group, but I don't think we should expect all future SVM systems to
be well-behaved.)

So when a user binds Task 1 to device A, it is *implicitly* giving device
B access to Task 1 as well. Simply because the IOMMU is unable to isolate
A from B, PASID or not. B could access the same address space as A, even
if you don't call bind again to explicitly attach the PASID table to B.

If the bind is done with device as argument, maybe users will believe that
using PASIDs provides an additional level of isolation within a group,
when it really doesn't. That's why I'm inclined to have the whole bind API
be on groups rather than devices, if only for clarity.

But I don't know, maybe a comment explaining the above would be sufficient.

To be frank my comment about group versus device is partly to make sure
that I grasp the various concepts correctly and that we're on the same
page. Doing the bind on groups is less significant in your case, for PASID
table binding, because VFIO already takes care of IOMMU group properly. In
my case I expect DRM, network, DMA drivers to use the API as well for
binding tasks, and I don't want to introduce ambiguity in the API that
would lead to security holes later.

>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>> might be simpler to let the IOMMU core take the group lock and do
>> group->domain->ops->bind_task(dev...) for each device. The question also
>> holds for iommu_do_invalidate in patch 3/8.
> 
> In my understanding, it is moving the for_each_dev loop into iommu driver?
> Is it?

Yes, that's what I meant

>> This way the prototypes would be:
>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> 
> For PASID table binding from guest, I think it'd better to be per-device op
> since the bind operation wants to modify the host context entry. But we may
> still share the API and do things differently in iommu driver.

Sure, as said above the use cases for PASID table and single PASID binding
are different, sharing the API is not strictly necessary.

> For invalidation, I think it'd better to be per-group. Actually, with guest
> IOMMU exists, there is only one group in a domain on Intel platform. Do it for
> each device is not expected. How about it on ARM?

In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
multiple groups in the same container to the same domain when possible.

>> For PASID table binding it might not matter much, as VFIO will most likely
>> be the only user. But task binding will be called by device drivers, which
>> by now should be encouraged to do things at iommu_group granularity.
>> Alternatively it could be done implicitly like in iommu_attach_device,
>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> 
> Do you mean the bind task from userspace driver? I guess you're trying to do
> different types of binding request in a single svm_bind API?
> 
>>
>> Extending this reasoning, since groups in a domain are also supposed to
>> have the same mappings, then similarly to map/unmap,
>> bind/unbind/invalidate should really be done with an iommu_domain (and
>> nothing else) as target argument. However this requires the IOMMU core to
>> keep a group list in each domain, which might complicate things a little
>> too much.
>>
>> But "all devices in a domain share the same PASID table" is the paradigm
>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>> iommu_group, it should be made more explicit to users, so they don't
>> assume that devices within a domain are isolated from each others with
>> regard to PASID DMA.
> 
> Is the isolation you mentioned means forbidding to do PASID DMA to the same
> virtual address space when the device comes from different domain?

In the above example, devices A, B and C are in the same IOMMU domain
(because, for instance, user put the two groups in the same VFIO
container.) Then in the SMMUv3 driver they would all share the same PASID
table. A, B and C can access Task 1 with the PASID obtained during the
depicted bind. They don't need to call bind again for device C, though it
would be good practice.

But D is in a different domain, so unless you also call bind on Task 1 for
device D, there is no way that D can access Task 1.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
  2017-05-08  4:09   ` [Qemu-devel] " Xiao Guangrong
  (?)
@ 2017-05-07  7:33   ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-07  7:33 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, ashok.raj, jean-philippe.brucker, jasowang,
	qemu-devel, jacob.jun.pan

On Mon, May 08, 2017 at 12:09:42PM +0800, Xiao Guangrong wrote:
> 
> Hi Liu Yi,
> 
> I haven't started to read the code yet, however, could you
> detail more please? It emulates a SVM capable iommu device in
> a VM? Or It speeds up device's DMA access in a VM? Or it is a
> new facility introduced for a VM? Could you please add a bit
> more for its usage?

Hi Guangrong,

Nice to hear from you.

This patchset is part of the whole SVM virtualization work. The whole
patchset wants to expose a SVM capable Intel IOMMU to guest. And yes,
it is an emulated iommu.

For the detail introduction for SVM and SVM virtualization, I think
you may get more from the link below.

http://www.spinics.net/lists/kvm/msg148798.html

For the usage, I can give an example with IGD. Latest IGD is SVM capable
device. On bare metal(Intel IOMMU is also SVM capable), application could
request to share its virtual address(an allocated buffer) with IGD device
through the IOCTL cmd provided by IGD driver. e.g. OpenCL application. When
IGD is assigned to a guest, it is expected to support such usage in guest.
With the SVM virtualization patchset, the application in guest would also
be able to share its virtual address with IGD device. Different from bare
metal, it's sharing GVA with IGD. The hardware IOMMU needs to help translate
the GVA to HPA. So hardware IOMMU needs to know the GVA->HPA mapping. This
patchset would make sure the GVA->HPA mapping is built and maintain the TLB.

Be free to let me know if you want more detail.

Thanks,
Yi L

> 
> Thanks!
> 
> On 04/26/2017 06:11 PM, Liu, Yi L wrote:
> >Hi,
> >
> >This patchset introduces SVM virtualization for intel_iommu in
> >IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
> >Qemu/IOMMU/VFIO.
> >
> >Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
> >Extend intel_iommu emulator to support Shared Virtual Memory"
> >
> >In this patchset, it adds two new IOMMU APIs and their implementation
> >in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
> >container->fd to propagate data from QEMU to kernel space.
> >
> >[Patch Overview]
> >* 1 adds iommu API definition for binding guest PASID table
> >* 2 adds binding PASID table API implementation in VT-d iommu driver
> >* 3 adds iommu API definition to do IOMMU TLB invalidation from guest
> >* 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
> >* 5 adds VFIO IOCTL for propagating PASID table binding from guest
> >* 6 adds processing of pasid table binding in vfio_iommu_type1
> >* 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
> >* 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1
> >
> >Best Wishes,
> >Yi L
> >
> >
> >Jacob Pan (3):
> >   iommu: Introduce bind_pasid_table API function
> >   iommu/vt-d: add bind_pasid_table function
> >   iommu/vt-d: Add iommu do invalidate function
> >
> >Liu, Yi L (5):
> >   iommu: Introduce iommu do invalidate API function
> >   VFIO: Add new IOTCL for PASID Table bind propagation
> >   VFIO: do pasid table binding
> >   VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
> >   VFIO: do IOMMU TLB invalidation from guest
> >
> >  drivers/iommu/intel-iommu.c     | 146 ++++++++++++++++++++++++++++++++++++++++
> >  drivers/iommu/iommu.c           |  32 +++++++++
> >  drivers/vfio/vfio_iommu_type1.c |  98 +++++++++++++++++++++++++++
> >  include/linux/dma_remapping.h   |   1 +
> >  include/linux/intel-iommu.h     |  11 +++
> >  include/linux/iommu.h           |  47 +++++++++++++
> >  include/uapi/linux/vfio.h       |  26 +++++++
> >  7 files changed, 361 insertions(+)
> >
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
  2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
@ 2017-05-08  4:09   ` Xiao Guangrong
  -1 siblings, 0 replies; 116+ messages in thread
From: Xiao Guangrong @ 2017-05-08  4:09 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, jean-philippe.brucker


Hi Liu Yi,

I haven't started to read the code yet, however, could you
detail more please? It emulates a SVM capable iommu device in
a VM? Or It speeds up device's DMA access in a VM? Or it is a
new facility introduced for a VM? Could you please add a bit
more for its usage?

Thanks!

On 04/26/2017 06:11 PM, Liu, Yi L wrote:
> Hi,
> 
> This patchset introduces SVM virtualization for intel_iommu in
> IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
> Qemu/IOMMU/VFIO.
> 
> Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
> Extend intel_iommu emulator to support Shared Virtual Memory"
> 
> In this patchset, it adds two new IOMMU APIs and their implementation
> in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
> container->fd to propagate data from QEMU to kernel space.
> 
> [Patch Overview]
> * 1 adds iommu API definition for binding guest PASID table
> * 2 adds binding PASID table API implementation in VT-d iommu driver
> * 3 adds iommu API definition to do IOMMU TLB invalidation from guest
> * 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
> * 5 adds VFIO IOCTL for propagating PASID table binding from guest
> * 6 adds processing of pasid table binding in vfio_iommu_type1
> * 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
> * 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1
> 
> Best Wishes,
> Yi L
> 
> 
> Jacob Pan (3):
>    iommu: Introduce bind_pasid_table API function
>    iommu/vt-d: add bind_pasid_table function
>    iommu/vt-d: Add iommu do invalidate function
> 
> Liu, Yi L (5):
>    iommu: Introduce iommu do invalidate API function
>    VFIO: Add new IOTCL for PASID Table bind propagation
>    VFIO: do pasid table binding
>    VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
>    VFIO: do IOMMU TLB invalidation from guest
> 
>   drivers/iommu/intel-iommu.c     | 146 ++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/iommu.c           |  32 +++++++++
>   drivers/vfio/vfio_iommu_type1.c |  98 +++++++++++++++++++++++++++
>   include/linux/dma_remapping.h   |   1 +
>   include/linux/intel-iommu.h     |  11 +++
>   include/linux/iommu.h           |  47 +++++++++++++
>   include/uapi/linux/vfio.h       |  26 +++++++
>   7 files changed, 361 insertions(+)
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
@ 2017-05-08  4:09   ` Xiao Guangrong
  0 siblings, 0 replies; 116+ messages in thread
From: Xiao Guangrong @ 2017-05-08  4:09 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, jean-philippe.brucker


Hi Liu Yi,

I haven't started to read the code yet, however, could you
detail more please? It emulates a SVM capable iommu device in
a VM? Or It speeds up device's DMA access in a VM? Or it is a
new facility introduced for a VM? Could you please add a bit
more for its usage?

Thanks!

On 04/26/2017 06:11 PM, Liu, Yi L wrote:
> Hi,
> 
> This patchset introduces SVM virtualization for intel_iommu in
> IOMMU/VFIO. The total SVM virtualization for intel_iommu touched
> Qemu/IOMMU/VFIO.
> 
> Another patchset would change the Qemu. It is "[RFC PATCH 0/20] Qemu:
> Extend intel_iommu emulator to support Shared Virtual Memory"
> 
> In this patchset, it adds two new IOMMU APIs and their implementation
> in intel_iommu driver. In VFIO, it adds two IOCTL cmd attached on
> container->fd to propagate data from QEMU to kernel space.
> 
> [Patch Overview]
> * 1 adds iommu API definition for binding guest PASID table
> * 2 adds binding PASID table API implementation in VT-d iommu driver
> * 3 adds iommu API definition to do IOMMU TLB invalidation from guest
> * 4 adds IOMMU TLB invalidation implementation in VT-d iommu driver
> * 5 adds VFIO IOCTL for propagating PASID table binding from guest
> * 6 adds processing of pasid table binding in vfio_iommu_type1
> * 7 adds VFIO IOCTL for propagating IOMMU TLB invalidation from guest
> * 8 adds processing of IOMMU TLB invalidation in vfio_iommu_type1
> 
> Best Wishes,
> Yi L
> 
> 
> Jacob Pan (3):
>    iommu: Introduce bind_pasid_table API function
>    iommu/vt-d: add bind_pasid_table function
>    iommu/vt-d: Add iommu do invalidate function
> 
> Liu, Yi L (5):
>    iommu: Introduce iommu do invalidate API function
>    VFIO: Add new IOTCL for PASID Table bind propagation
>    VFIO: do pasid table binding
>    VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
>    VFIO: do IOMMU TLB invalidation from guest
> 
>   drivers/iommu/intel-iommu.c     | 146 ++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/iommu.c           |  32 +++++++++
>   drivers/vfio/vfio_iommu_type1.c |  98 +++++++++++++++++++++++++++
>   include/linux/dma_remapping.h   |   1 +
>   include/linux/intel-iommu.h     |  11 +++
>   include/linux/iommu.h           |  47 +++++++++++++
>   include/uapi/linux/vfio.h       |  26 +++++++
>   7 files changed, 361 insertions(+)
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 6/8] VFIO: do pasid table binding
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-09  7:55     ` Xiao Guangrong
  -1 siblings, 0 replies; 116+ messages in thread
From: Xiao Guangrong @ 2017-05-09  7:55 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, jean-philippe.brucker, Liu, Yi L



On 04/26/2017 06:12 PM, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds IOCTL processing in vfio_iommu_type1 for
> VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> calling iommu_ops->bind_pasid_table to link the whole
> PASID table to pIOMMU.
> 
> For VT-d, it is linking the guest PASID table to host pIOMMU.
> This is key point to support SVM virtualization on VT-d.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>   drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..30b6d48 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>   	return ret;
>   }
>   
> +struct vfio_svm_task {
> +	struct iommu_domain *domain;
> +	void *payload;
> +};
> +
> +static int bind_pasid_tbl_fn(struct device *dev, void *data)
> +{
> +	int ret = 0;
> +	struct vfio_svm_task *task = data;
> +	struct pasid_table_info *pasidt_binfo;
> +
> +	pasidt_binfo = task->payload;
> +	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> +	return ret;
> +}
> +
> +static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> +				int (*fn)(struct device *, void *))
> +{
> +	int ret = 0;
> +	struct vfio_domain *d;
> +	struct vfio_group *g;
> +	struct vfio_svm_task task;
> +
> +	task.payload = data;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		list_for_each_entry(g, &d->group_list, next) {
> +			if (g->iommu_group != NULL) {
> +				task.domain = d->domain;
> +				ret = iommu_group_for_each_dev(
> +					g->iommu_group, &task, fn);
> +				if (ret != 0)
> +					break;
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>   static long vfio_iommu_type1_ioctl(void *iommu_data,
>   				   unsigned int cmd, unsigned long arg)
>   {
> @@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   
>   		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>   			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> +		struct vfio_device_svm hdr;
> +		u8 *data = NULL;
> +		int ret = 0;
> +
> +		minsz = offsetofend(struct vfio_device_svm, length);
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (hdr.length == 0)
> +			return -EINVAL;
> +
> +		data = memdup_user((void __user *)(arg + minsz),
> +					hdr.length);

You should check the @length is at least sizeof(struct pasid_table_info) as
kernel uses it as pasid_table_info, a evil application can crash kernel.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding
@ 2017-05-09  7:55     ` Xiao Guangrong
  0 siblings, 0 replies; 116+ messages in thread
From: Xiao Guangrong @ 2017-05-09  7:55 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, jean-philippe.brucker, Liu, Yi L



On 04/26/2017 06:12 PM, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds IOCTL processing in vfio_iommu_type1 for
> VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> calling iommu_ops->bind_pasid_table to link the whole
> PASID table to pIOMMU.
> 
> For VT-d, it is linking the guest PASID table to host pIOMMU.
> This is key point to support SVM virtualization on VT-d.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>   drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..30b6d48 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>   	return ret;
>   }
>   
> +struct vfio_svm_task {
> +	struct iommu_domain *domain;
> +	void *payload;
> +};
> +
> +static int bind_pasid_tbl_fn(struct device *dev, void *data)
> +{
> +	int ret = 0;
> +	struct vfio_svm_task *task = data;
> +	struct pasid_table_info *pasidt_binfo;
> +
> +	pasidt_binfo = task->payload;
> +	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> +	return ret;
> +}
> +
> +static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> +				int (*fn)(struct device *, void *))
> +{
> +	int ret = 0;
> +	struct vfio_domain *d;
> +	struct vfio_group *g;
> +	struct vfio_svm_task task;
> +
> +	task.payload = data;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		list_for_each_entry(g, &d->group_list, next) {
> +			if (g->iommu_group != NULL) {
> +				task.domain = d->domain;
> +				ret = iommu_group_for_each_dev(
> +					g->iommu_group, &task, fn);
> +				if (ret != 0)
> +					break;
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>   static long vfio_iommu_type1_ioctl(void *iommu_data,
>   				   unsigned int cmd, unsigned long arg)
>   {
> @@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   
>   		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>   			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> +		struct vfio_device_svm hdr;
> +		u8 *data = NULL;
> +		int ret = 0;
> +
> +		minsz = offsetofend(struct vfio_device_svm, length);
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (hdr.length == 0)
> +			return -EINVAL;
> +
> +		data = memdup_user((void __user *)(arg + minsz),
> +					hdr.length);

You should check the @length is at least sizeof(struct pasid_table_info) as
kernel uses it as pasid_table_info, a evil application can crash kernel.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-11 10:29       ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-11 10:29 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, Apr 26, 2017 at 06:12:02PM +0800, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

Hi Alex,

In this patchset, I'm trying to add two new IOCTL cmd for Shared
Virtual Memory virtualization. One for binding guest PASID Table
and one for iommu tlb invalidation from guest. ARM has similar
requirement on SVM supporting. Since it touched VFIO, I'd like
to know your comments on changes in VFIO.

Thanks,
Yi L

> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-05-11 10:29       ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-11 10:29 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, iommu, peterx, tianyu.lan, kevin.tian, ashok.raj,
	jean-philippe.brucker, jasowang, qemu-devel, jacob.jun.pan

On Wed, Apr 26, 2017 at 06:12:02PM +0800, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

Hi Alex,

In this patchset, I'm trying to add two new IOCTL cmd for Shared
Virtual Memory virtualization. One for binding guest PASID Table
and one for iommu tlb invalidation from guest. ARM has similar
requirement on SVM supporting. Since it touched VFIO, I'd like
to know your comments on changes in VFIO.

Thanks,
Yi L

> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding
  2017-05-09  7:55     ` [Qemu-devel] " Xiao Guangrong
  (?)
@ 2017-05-11 10:29     ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-11 10:29 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, ashok.raj, jean-philippe.brucker, jasowang,
	qemu-devel, jacob.jun.pan

On Tue, May 09, 2017 at 03:55:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 04/26/2017 06:12 PM, Liu, Yi L wrote:
> >From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> >
> >This patch adds IOCTL processing in vfio_iommu_type1 for
> >VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> >calling iommu_ops->bind_pasid_table to link the whole
> >PASID table to pIOMMU.
> >
> >For VT-d, it is linking the guest PASID table to host pIOMMU.
> >This is key point to support SVM virtualization on VT-d.
> >
> >Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >---
> >  drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 72 insertions(+)
> >
> >diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >index b3cc33f..30b6d48 100644
> >--- a/drivers/vfio/vfio_iommu_type1.c
> >+++ b/drivers/vfio/vfio_iommu_type1.c
> >@@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >  	return ret;
> >  }
> >+struct vfio_svm_task {
> >+	struct iommu_domain *domain;
> >+	void *payload;
> >+};
> >+
> >+static int bind_pasid_tbl_fn(struct device *dev, void *data)
> >+{
> >+	int ret = 0;
> >+	struct vfio_svm_task *task = data;
> >+	struct pasid_table_info *pasidt_binfo;
> >+
> >+	pasidt_binfo = task->payload;
> >+	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> >+	return ret;
> >+}
> >+
> >+static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> >+				int (*fn)(struct device *, void *))
> >+{
> >+	int ret = 0;
> >+	struct vfio_domain *d;
> >+	struct vfio_group *g;
> >+	struct vfio_svm_task task;
> >+
> >+	task.payload = data;
> >+
> >+	mutex_lock(&iommu->lock);
> >+
> >+	list_for_each_entry(d, &iommu->domain_list, next) {
> >+		list_for_each_entry(g, &d->group_list, next) {
> >+			if (g->iommu_group != NULL) {
> >+				task.domain = d->domain;
> >+				ret = iommu_group_for_each_dev(
> >+					g->iommu_group, &task, fn);
> >+				if (ret != 0)
> >+					break;
> >+			}
> >+		}
> >+	}
> >+
> >+	mutex_unlock(&iommu->lock);
> >+	return ret;
> >+}
> >+
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)
> >  {
> >@@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
> >  			-EFAULT : 0;
> >+	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> >+		struct vfio_device_svm hdr;
> >+		u8 *data = NULL;
> >+		int ret = 0;
> >+
> >+		minsz = offsetofend(struct vfio_device_svm, length);
> >+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> >+			return -EFAULT;
> >+
> >+		if (hdr.length == 0)
> >+			return -EINVAL;
> >+
> >+		data = memdup_user((void __user *)(arg + minsz),
> >+					hdr.length);
> 
> You should check the @length is at least sizeof(struct pasid_table_info) as
> kernel uses it as pasid_table_info, a evil application can crash kernel.

Yes, thx for the remind.

Thanks,
Yi L 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 12:11     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-12 12:11 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Liu, Yi L

Hi Yi,

On 26/04/17 11:12, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> invalidate request from guest to host.
> 
> In the case of SVM virtualization on VT-d, host IOMMU driver has
> no knowledge of caching structure updates unless the guest
> invalidation activities are passed down to the host. So a new
> IOCTL is needed to propagate the guest cache invalidation through
> VFIO.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6b97987..50c51f8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>  
>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>  
> +/* For IOMMU TLB Invalidation Propagation */
> +struct vfio_iommu_tlb_invalidate {
> +	__u32	argsz;
> +	__u32	length;
> +	__u8	data[];
> +};

We initially discussed something a little more generic than this, with
most info explicitly described and only pIOMMU-specific quirks and hints
in an opaque structure. Out of curiosity, why the change? I'm not against
a fully opaque structure, but there seem to be a large overlap between TLB
invalidations across architectures.


For what it's worth, when prototyping the paravirtualized IOMMU I came up
with the following.

(From the paravirtualized POV, the SMMU also has to swizzle endianess
after unpacking an opaque structure, since userspace doesn't know what's
in it and guest might use a different endianess. So we need to force all
opaque data to be e.g. little-endian.)

struct vfio_iommu_tlb_invalidate {
	__u32	argsz;
	__u32	scope;
	__u32	flags;
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	__u8	data[];
};

Scope is a bitfields restricting the invalidation scope. By default
invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
and @size are unused.

Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
scope to the pasid described by @pasid.
Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
scope to the address range described by (@vaddr, @size).

So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
range for *all* pasids (as well as no_pasid). Setting scope =
(VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
the VA range only for @pasid.

Flags depend on the selected scope:

VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
without scope or with INVALIDATE_VADDR) targets non-pasid mappings
exclusively (some architectures, e.g. SMMU, allow this)

VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
to invalidate all intermediate tables cached as part of the PTW for vaddr,
only the last-level entry (pte). This is a hint.

I guess what's missing for Intel IOMMU and would go in @data is the
"global" hint (which we don't have in SMMU invalidations). Do you see
anything else, that the pIOMMU cannot deduce from this structure?

Thanks,
Jean


> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-05-12 12:11     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-12 12:11 UTC (permalink / raw)
  To: Liu, Yi L, kvm, iommu, alex.williamson, peterx
  Cc: jasowang, qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, Liu, Yi L

Hi Yi,

On 26/04/17 11:12, Liu, Yi L wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> invalidate request from guest to host.
> 
> In the case of SVM virtualization on VT-d, host IOMMU driver has
> no knowledge of caching structure updates unless the guest
> invalidation activities are passed down to the host. So a new
> IOCTL is needed to propagate the guest cache invalidation through
> VFIO.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6b97987..50c51f8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>  
>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>  
> +/* For IOMMU TLB Invalidation Propagation */
> +struct vfio_iommu_tlb_invalidate {
> +	__u32	argsz;
> +	__u32	length;
> +	__u8	data[];
> +};

We initially discussed something a little more generic than this, with
most info explicitly described and only pIOMMU-specific quirks and hints
in an opaque structure. Out of curiosity, why the change? I'm not against
a fully opaque structure, but there seem to be a large overlap between TLB
invalidations across architectures.


For what it's worth, when prototyping the paravirtualized IOMMU I came up
with the following.

(From the paravirtualized POV, the SMMU also has to swizzle endianess
after unpacking an opaque structure, since userspace doesn't know what's
in it and guest might use a different endianess. So we need to force all
opaque data to be e.g. little-endian.)

struct vfio_iommu_tlb_invalidate {
	__u32	argsz;
	__u32	scope;
	__u32	flags;
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	__u8	data[];
};

Scope is a bitfields restricting the invalidation scope. By default
invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
and @size are unused.

Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
scope to the pasid described by @pasid.
Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
scope to the address range described by (@vaddr, @size).

So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
range for *all* pasids (as well as no_pasid). Setting scope =
(VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
the VA range only for @pasid.

Flags depend on the selected scope:

VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
without scope or with INVALIDATE_VADDR) targets non-pasid mappings
exclusively (some architectures, e.g. SMMU, allow this)

VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
to invalidate all intermediate tables cached as part of the PTW for vaddr,
only the last-level entry (pte). This is a hint.

I guess what's missing for Intel IOMMU and would go in @data is the
"global" hint (which we don't have in SMMU invalidations). Do you see
anything else, that the pIOMMU cannot deduce from this structure?

Thanks,
Jean


> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:58     ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:58 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L

On Wed, 26 Apr 2017 18:12:04 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> invalidate request from guest to host.
> 
> In the case of SVM virtualization on VT-d, host IOMMU driver has
> no knowledge of caching structure updates unless the guest
> invalidation activities are passed down to the host. So a new
> IOCTL is needed to propagate the guest cache invalidation through
> VFIO.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6b97987..50c51f8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>  
>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>  
> +/* For IOMMU TLB Invalidation Propagation */
> +struct vfio_iommu_tlb_invalidate {
> +	__u32	argsz;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)

I'm kind of wondering why this isn't just a new flag bit on
vfio_device_svm, the data structure is so similar.  Of course data
needs to be fully specified in uapi.

> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-05-12 21:58     ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:58 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L

On Wed, 26 Apr 2017 18:12:04 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> invalidate request from guest to host.
> 
> In the case of SVM virtualization on VT-d, host IOMMU driver has
> no knowledge of caching structure updates unless the guest
> invalidation activities are passed down to the host. So a new
> IOCTL is needed to propagate the guest cache invalidation through
> VFIO.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6b97987..50c51f8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>  
>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>  
> +/* For IOMMU TLB Invalidation Propagation */
> +struct vfio_iommu_tlb_invalidate {
> +	__u32	argsz;
> +	__u32	length;
> +	__u8	data[];
> +};
> +
> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)

I'm kind of wondering why this isn't just a new flag bit on
vfio_device_svm, the data structure is so similar.  Of course data
needs to be fully specified in uapi.

> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:58       ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:58 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 18:12:02 +0800
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];

In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
pasid_table_info?  So at a minimum this is a union including struct
pasid_table_info.  Furthermore how does a user learn what the opaque
data in struct pasid_table_info is without looking at the code?  A user
API needs to be clear and documented, not opaque and variable.  We
should also have references to the hardware spec for an Intel or ARM
PASID table in uapi.  flags should be defined as they're used, let's
not reserve them with the expectation of future use.

> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-05-12 21:58       ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:58 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L

On Wed, 26 Apr 2017 18:12:02 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> binding requests.
> 
> On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> to host. While for other vendors, it may also be used to support other
> kind of SVM bind request. Previously, there is a discussion on it with
> ARM engineer. It can be found by the link below. This IOCTL cmd may
> support SVM PASID bind request from userspace driver, or page table(cr3)
> bind request from guest. These SVM bind requests would be supported by
> adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> support page table bind from guest.
> 
> https://patchwork.kernel.org/patch/9594231/
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/uapi/linux/vfio.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..6b97987 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/* IOCTL for Shared Virtual Memory Bind */
> +struct vfio_device_svm {
> +	__u32	argsz;
> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> +	__u32	flags;
> +	__u32	length;
> +	__u8	data[];

In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
pasid_table_info?  So at a minimum this is a union including struct
pasid_table_info.  Furthermore how does a user learn what the opaque
data in struct pasid_table_info is without looking at the code?  A user
API needs to be clear and documented, not opaque and variable.  We
should also have references to the hardware spec for an Intel or ARM
PASID table in uapi.  flags should be defined as they're used, let's
not reserve them with the expectation of future use.

> +};
> +
> +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> +				VFIO_SVM_BIND_PASID | \
> +				VFIO_SVM_BIND_PGTABLE)
> +
> +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 6/8] VFIO: do pasid table binding
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:59     ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L

On Wed, 26 Apr 2017 18:12:03 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds IOCTL processing in vfio_iommu_type1 for
> VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> calling iommu_ops->bind_pasid_table to link the whole
> PASID table to pIOMMU.
> 
> For VT-d, it is linking the guest PASID table to host pIOMMU.
> This is key point to support SVM virtualization on VT-d.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..30b6d48 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +struct vfio_svm_task {
> +	struct iommu_domain *domain;
> +	void *payload;
> +};
> +
> +static int bind_pasid_tbl_fn(struct device *dev, void *data)
> +{
> +	int ret = 0;
> +	struct vfio_svm_task *task = data;

Maybe avoid "task" or use svm_task to differentiate from task_struct
task used elsewhere in this file.

> +	struct pasid_table_info *pasidt_binfo;
> +
> +	pasidt_binfo = task->payload;
> +	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> +	return ret;
> +}
> +
> +static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> +				int (*fn)(struct device *, void *))
> +{
> +	int ret = 0;
> +	struct vfio_domain *d;
> +	struct vfio_group *g;
> +	struct vfio_svm_task task;
> +
> +	task.payload = data;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		list_for_each_entry(g, &d->group_list, next) {
> +			if (g->iommu_group != NULL) {

Can it ever be NULL?

> +				task.domain = d->domain;
> +				ret = iommu_group_for_each_dev(
> +					g->iommu_group, &task, fn);
> +				if (ret != 0)
> +					break;
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> +		struct vfio_device_svm hdr;
> +		u8 *data = NULL;

But it really should be a struct pasid_table_info.

> +		int ret = 0;
> +
> +		minsz = offsetofend(struct vfio_device_svm, length);
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (hdr.length == 0)
> +			return -EINVAL;
> +
> +		data = memdup_user((void __user *)(arg + minsz),
> +					hdr.length);
> +		if (IS_ERR(data))
> +			return PTR_ERR(data);
> +
> +		switch (hdr.flags & VFIO_SVM_TYPE_MASK) {
> +		case VFIO_SVM_BIND_PASIDTBL:
> +			ret = vfio_do_svm_task(iommu, data,
> +						bind_pasid_tbl_fn);
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			break;
> +		}
> +		kfree(data);
> +		return ret;
>  	}
>  
>  	return -ENOTTY;

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding
@ 2017-05-12 21:59     ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L

On Wed, 26 Apr 2017 18:12:03 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> This patch adds IOCTL processing in vfio_iommu_type1 for
> VFIO_IOMMU_SVM_BIND_TASK. Binds the PASID table bind by
> calling iommu_ops->bind_pasid_table to link the whole
> PASID table to pIOMMU.
> 
> For VT-d, it is linking the guest PASID table to host pIOMMU.
> This is key point to support SVM virtualization on VT-d.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index b3cc33f..30b6d48 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1512,6 +1512,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +struct vfio_svm_task {
> +	struct iommu_domain *domain;
> +	void *payload;
> +};
> +
> +static int bind_pasid_tbl_fn(struct device *dev, void *data)
> +{
> +	int ret = 0;
> +	struct vfio_svm_task *task = data;

Maybe avoid "task" or use svm_task to differentiate from task_struct
task used elsewhere in this file.

> +	struct pasid_table_info *pasidt_binfo;
> +
> +	pasidt_binfo = task->payload;
> +	ret = iommu_bind_pasid_table(task->domain, dev, pasidt_binfo);
> +	return ret;
> +}
> +
> +static int vfio_do_svm_task(struct vfio_iommu *iommu, void *data,
> +				int (*fn)(struct device *, void *))
> +{
> +	int ret = 0;
> +	struct vfio_domain *d;
> +	struct vfio_group *g;
> +	struct vfio_svm_task task;
> +
> +	task.payload = data;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		list_for_each_entry(g, &d->group_list, next) {
> +			if (g->iommu_group != NULL) {

Can it ever be NULL?

> +				task.domain = d->domain;
> +				ret = iommu_group_for_each_dev(
> +					g->iommu_group, &task, fn);
> +				if (ret != 0)
> +					break;
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1582,6 +1626,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +	} else if (cmd == VFIO_IOMMU_SVM_BIND_TASK) {
> +		struct vfio_device_svm hdr;
> +		u8 *data = NULL;

But it really should be a struct pasid_table_info.

> +		int ret = 0;
> +
> +		minsz = offsetofend(struct vfio_device_svm, length);
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (hdr.length == 0)
> +			return -EINVAL;
> +
> +		data = memdup_user((void __user *)(arg + minsz),
> +					hdr.length);
> +		if (IS_ERR(data))
> +			return PTR_ERR(data);
> +
> +		switch (hdr.flags & VFIO_SVM_TYPE_MASK) {
> +		case VFIO_SVM_BIND_PASIDTBL:
> +			ret = vfio_do_svm_task(iommu, data,
> +						bind_pasid_tbl_fn);
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			break;
> +		}
> +		kfree(data);
> +		return ret;
>  	}
>  
>  	return -ENOTTY;

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-26 10:11   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:59       ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 18:11:58 +0800
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> case in the guest:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> 
> As part of the proposed architecture, when a SVM capable PCI
> device is assigned to a guest, nested mode is turned on. Guest owns the
> first level page tables (request with PASID) and performs GVA->GPA
> translation. Second level page tables are owned by the host for GPA->HPA
> translation for both request with and without PASID.
> 
> A new IOMMU driver interface is therefore needed to perform tasks as
> follows:
> * Enable nested translation and appropriate translation type
> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> 
> This patch introduces new functions called iommu_(un)bind_pasid_table()
> to IOMMU APIs. Architecture specific IOMMU function can be added later
> to perform the specific steps for binding pasid table of assigned devices.
> 
> This patch also adds model definition in iommu.h. It would be used to
> check if the bind request is from a compatible entity. e.g. a bind
> request from an intel_iommu emulator may not be supported by an ARM SMMU
> driver.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index dbe7f65..f2da636 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	if (unlikely(!domain->ops->unbind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->unbind_pasid_table(domain, dev);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..491a011 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>  	int			prot;
>  };
>  
> +struct pasid_table_info {
> +	__u64	ptr;	/* PASID table ptr */
> +	__u64	size;	/* PASID table size*/
> +	__u32	model;	/* magic number */
> +#define INTEL_IOMMU	(1 << 0)
> +#define ARM_SMMU	(1 << 1)
> +	__u8	opaque[];/* IOMMU-specific details */
> +};

This needs to be in uapi since you're expecting a user to pass it 

> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -159,6 +168,8 @@ struct iommu_dm_region {
>   * @domain_get_windows: Return the number of windows for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @bind_pasid_table: bind pasid table pointer for guest SVM
> + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
>   */
>  struct iommu_ops {
>  	bool (*capable)(enum iommu_cap);
> @@ -200,6 +211,10 @@ struct iommu_ops {
>  	u32 (*domain_get_windows)(struct iommu_domain *domain);
>  
>  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> +	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> +				struct pasid_table_info *pasidt_binfo);
> +	int (*unbind_pasid_table)(struct iommu_domain *domain,
> +				struct device *dev);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  			       struct device *dev);
>  extern void iommu_detach_device(struct iommu_domain *domain,
>  				struct device *dev);
> +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo);
> +extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> +				struct device *dev);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
> @@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
>  	return NULL;
>  }
>  
> +static inline
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	return -EINVAL;
> +}
> +static inline
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	return -EINVAL;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-05-12 21:59       ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

On Wed, 26 Apr 2017 18:11:58 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> case in the guest:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> 
> As part of the proposed architecture, when a SVM capable PCI
> device is assigned to a guest, nested mode is turned on. Guest owns the
> first level page tables (request with PASID) and performs GVA->GPA
> translation. Second level page tables are owned by the host for GPA->HPA
> translation for both request with and without PASID.
> 
> A new IOMMU driver interface is therefore needed to perform tasks as
> follows:
> * Enable nested translation and appropriate translation type
> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> 
> This patch introduces new functions called iommu_(un)bind_pasid_table()
> to IOMMU APIs. Architecture specific IOMMU function can be added later
> to perform the specific steps for binding pasid table of assigned devices.
> 
> This patch also adds model definition in iommu.h. It would be used to
> check if the bind request is from a compatible entity. e.g. a bind
> request from an intel_iommu emulator may not be supported by an ARM SMMU
> driver.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index dbe7f65..f2da636 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	if (unlikely(!domain->ops->unbind_pasid_table))
> +		return -EINVAL;
> +
> +	return domain->ops->unbind_pasid_table(domain, dev);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0ff5111..491a011 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -131,6 +131,15 @@ struct iommu_dm_region {
>  	int			prot;
>  };
>  
> +struct pasid_table_info {
> +	__u64	ptr;	/* PASID table ptr */
> +	__u64	size;	/* PASID table size*/
> +	__u32	model;	/* magic number */
> +#define INTEL_IOMMU	(1 << 0)
> +#define ARM_SMMU	(1 << 1)
> +	__u8	opaque[];/* IOMMU-specific details */
> +};

This needs to be in uapi since you're expecting a user to pass it 

> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -159,6 +168,8 @@ struct iommu_dm_region {
>   * @domain_get_windows: Return the number of windows for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @bind_pasid_table: bind pasid table pointer for guest SVM
> + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
>   */
>  struct iommu_ops {
>  	bool (*capable)(enum iommu_cap);
> @@ -200,6 +211,10 @@ struct iommu_ops {
>  	u32 (*domain_get_windows)(struct iommu_domain *domain);
>  
>  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> +	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> +				struct pasid_table_info *pasidt_binfo);
> +	int (*unbind_pasid_table)(struct iommu_domain *domain,
> +				struct device *dev);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  			       struct device *dev);
>  extern void iommu_detach_device(struct iommu_domain *domain,
>  				struct device *dev);
> +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo);
> +extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> +				struct device *dev);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
> @@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
>  	return NULL;
>  }
>  
> +static inline
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	return -EINVAL;
> +}
> +static inline
> +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> +{
> +	return -EINVAL;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:59       ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 18:12:01 +0800
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> This patch adds Intel VT-d specific function to implement
> iommu_do_invalidate API.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h | 11 +++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6d5b939..0b098ad 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
>  	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
>  }
>  
> +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct intel_invalidate_data *inv_data;
> +	struct qi_desc *qi;
> +	u16 did;
> +	u8 bus, devfn;
> +
> +	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> +		return -EINVAL;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> +
> +	/* check SID */
> +	if (PCI_DEVID(bus, devfn) != inv_data->sid)
> +		return 0;
> +
> +	qi = &inv_data->inv_desc;
> +
> +	switch (qi->low & QI_TYPE_MASK) {
> +	case QI_DIOTLB_TYPE:
> +	case QI_DEIOTLB_TYPE:
> +		/* for device IOTLB, we just let it pass through */
> +		break;
> +	default:
> +		did = dmar_domain->iommu_did[iommu->seq_id];
> +		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> +		break;
> +	}
> +
> +	ret = qi_submit_sync(qi, iommu);
> +
> +	return ret;

nit, ret variable is unnecessary.

> +}
> +
>  static int intel_iommu_map(struct iommu_domain *domain,
>  			   unsigned long iova, phys_addr_t hpa,
>  			   size_t size, int iommu_prot)
> @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	.bind_pasid_table	= intel_iommu_bind_pasid_table,
>  	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +	.do_invalidate		= intel_iommu_do_invalidate,
>  #endif
>  	.map		= intel_iommu_map,
>  	.unmap		= intel_iommu_unmap,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ac04f28..9d6562c 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -29,6 +29,7 @@
>  #include <linux/dma_remapping.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/list.h>
> +#include <linux/bitops.h>
>  #include <asm/cacheflush.h>
>  #include <asm/iommu.h>
>  
> @@ -271,6 +272,10 @@ enum {
>  #define QI_PGRP_RESP_TYPE	0x9
>  #define QI_PSTRM_RESP_TYPE	0xa
>  
> +#define QI_DID(did)		(((u64)did & 0xffff) << 16)
> +#define QI_DID_MASK		GENMASK(31, 16)
> +#define QI_TYPE_MASK		GENMASK(3, 0)
> +
>  #define QI_IEC_SELECTIVE	(((u64)1) << 4)
>  #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
>  #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
> @@ -529,6 +534,12 @@ struct intel_svm {
>  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
>  #endif
>  
> +struct intel_invalidate_data {
> +	u16 sid;
> +	u32 pasid;
> +	struct qi_desc inv_desc;
> +};

This needs to be uapi since the vfio user is expected to create it, so
we need a uapi version of qi_desc too.

> +
>  extern const struct attribute_group *intel_iommu_groups[];
>  extern void intel_iommu_debugfs_init(void);
>  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
@ 2017-05-12 21:59       ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

On Wed, 26 Apr 2017 18:12:01 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> This patch adds Intel VT-d specific function to implement
> iommu_do_invalidate API.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h | 11 +++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6d5b939..0b098ad 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
>  	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
>  }
>  
> +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct intel_invalidate_data *inv_data;
> +	struct qi_desc *qi;
> +	u16 did;
> +	u8 bus, devfn;
> +
> +	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> +		return -EINVAL;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> +
> +	/* check SID */
> +	if (PCI_DEVID(bus, devfn) != inv_data->sid)
> +		return 0;
> +
> +	qi = &inv_data->inv_desc;
> +
> +	switch (qi->low & QI_TYPE_MASK) {
> +	case QI_DIOTLB_TYPE:
> +	case QI_DEIOTLB_TYPE:
> +		/* for device IOTLB, we just let it pass through */
> +		break;
> +	default:
> +		did = dmar_domain->iommu_did[iommu->seq_id];
> +		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> +		break;
> +	}
> +
> +	ret = qi_submit_sync(qi, iommu);
> +
> +	return ret;

nit, ret variable is unnecessary.

> +}
> +
>  static int intel_iommu_map(struct iommu_domain *domain,
>  			   unsigned long iova, phys_addr_t hpa,
>  			   size_t size, int iommu_prot)
> @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	.bind_pasid_table	= intel_iommu_bind_pasid_table,
>  	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +	.do_invalidate		= intel_iommu_do_invalidate,
>  #endif
>  	.map		= intel_iommu_map,
>  	.unmap		= intel_iommu_unmap,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ac04f28..9d6562c 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -29,6 +29,7 @@
>  #include <linux/dma_remapping.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/list.h>
> +#include <linux/bitops.h>
>  #include <asm/cacheflush.h>
>  #include <asm/iommu.h>
>  
> @@ -271,6 +272,10 @@ enum {
>  #define QI_PGRP_RESP_TYPE	0x9
>  #define QI_PSTRM_RESP_TYPE	0xa
>  
> +#define QI_DID(did)		(((u64)did & 0xffff) << 16)
> +#define QI_DID_MASK		GENMASK(31, 16)
> +#define QI_TYPE_MASK		GENMASK(3, 0)
> +
>  #define QI_IEC_SELECTIVE	(((u64)1) << 4)
>  #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
>  #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
> @@ -529,6 +534,12 @@ struct intel_svm {
>  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
>  #endif
>  
> +struct intel_invalidate_data {
> +	u16 sid;
> +	u32 pasid;
> +	struct qi_desc inv_desc;
> +};

This needs to be uapi since the vfio user is expected to create it, so
we need a uapi version of qi_desc too.

> +
>  extern const struct attribute_group *intel_iommu_groups[];
>  extern void intel_iommu_debugfs_init(void);
>  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
  2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:59       ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 18:12:00 +0800
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> When a SVM capable device is assigned to a guest, the first level page
> tables are owned by the guest and the guest PASID table pointer is
> linked to the device context entry of the physical IOMMU.
> 
> Host IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation activities are passed down to the host. The
> primary usage is derived from emulated IOMMU in the guest, where QEMU
> can trap invalidation activities before pass them down the
> host/physical IOMMU. There are IOMMU architectural specific actions
> need to be taken which requires the generic APIs introduced in this
> patch to have opaque data in the tlb_invalidate_info argument.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  drivers/iommu/iommu.c | 13 +++++++++++++
>  include/linux/iommu.h | 16 ++++++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2da636..ca7cff2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>  
> +int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +
> +	if (unlikely(domain->ops->do_invalidate == NULL))
> +		return -ENODEV;
> +
> +	ret = domain->ops->do_invalidate(domain, dev, inv_info);
> +	return ret;

nit, ret is unnecessary.

> +}
> +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 491a011..a48e3b75 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -140,6 +140,11 @@ struct pasid_table_info {
>  	__u8	opaque[];/* IOMMU-specific details */
>  };
>  
> +struct tlb_invalidate_info {
> +	__u32	model;
> +	__u8	opaque[];
> +};

I'm wondering if 'model' is really necessary here, shouldn't this
function only be called if a bind_pasid_table() succeeded, and then the
model would be set at that time?

This also needs to be uapi since you're expecting a user to provide it
to vfio.  The opaque data needs to be fully specified (relative to
uapi) per model.

> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -215,6 +220,8 @@ struct iommu_ops {
>  				struct pasid_table_info *pasidt_binfo);
>  	int (*unbind_pasid_table)(struct iommu_domain *domain,
>  				struct device *dev);
> +	int (*do_invalidate)(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
>  		struct device *dev, struct pasid_table_info *pasidt_binfo);
>  extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
>  				struct device *dev);
> +extern int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info);
> +
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
> @@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>  	return -EINVAL;
>  }
>  
> +static inline int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	return -EINVAL;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
@ 2017-05-12 21:59       ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Liu, Yi L,
	Jacob Pan

On Wed, 26 Apr 2017 18:12:00 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> When a SVM capable device is assigned to a guest, the first level page
> tables are owned by the guest and the guest PASID table pointer is
> linked to the device context entry of the physical IOMMU.
> 
> Host IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation activities are passed down to the host. The
> primary usage is derived from emulated IOMMU in the guest, where QEMU
> can trap invalidation activities before pass them down the
> host/physical IOMMU. There are IOMMU architectural specific actions
> need to be taken which requires the generic APIs introduced in this
> patch to have opaque data in the tlb_invalidate_info argument.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/iommu.c | 13 +++++++++++++
>  include/linux/iommu.h | 16 ++++++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2da636..ca7cff2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
>  
> +int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +
> +	if (unlikely(domain->ops->do_invalidate == NULL))
> +		return -ENODEV;
> +
> +	ret = domain->ops->do_invalidate(domain, dev, inv_info);
> +	return ret;

nit, ret is unnecessary.

> +}
> +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 491a011..a48e3b75 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -140,6 +140,11 @@ struct pasid_table_info {
>  	__u8	opaque[];/* IOMMU-specific details */
>  };
>  
> +struct tlb_invalidate_info {
> +	__u32	model;
> +	__u8	opaque[];
> +};

I'm wondering if 'model' is really necessary here, shouldn't this
function only be called if a bind_pasid_table() succeeded, and then the
model would be set at that time?

This also needs to be uapi since you're expecting a user to provide it
to vfio.  The opaque data needs to be fully specified (relative to
uapi) per model.

> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -215,6 +220,8 @@ struct iommu_ops {
>  				struct pasid_table_info *pasidt_binfo);
>  	int (*unbind_pasid_table)(struct iommu_domain *domain,
>  				struct device *dev);
> +	int (*do_invalidate)(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
>  		struct device *dev, struct pasid_table_info *pasidt_binfo);
>  extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
>  				struct device *dev);
> +extern int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info);
> +
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
> @@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
>  	return -EINVAL;
>  }
>  
> +static inline int iommu_do_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	return -EINVAL;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
  2017-04-26 10:11   ` [Qemu-devel] " Liu, Yi L
@ 2017-05-12 21:59       ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 26 Apr 2017 18:11:59 +0800
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> 
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  drivers/iommu/intel-iommu.c   | 103 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 104 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 646756c..6d5b939 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;

struct pasid_table_info is expected to be provided by a user, the
opaque data structure for model == INTEL_IOMMU therefore needs to be
documented in uapi.

> +	/* check SID, if it is not correct, return */
> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;

This is a bit weird, it took me until later in the series to understand
why this is a success case.  Perhaps the device matching needs to be
standardized in pasid_table_info rather than the opaque data.
Minimally, more comments.

> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported) {
> +		pr_err("Device %d:%d.%d has no pasid support\n", bus,
> +			PCI_SLOT(devfn), PCI_FUNC(devfn));

PCI addresses should be printed in hex and include the segment.  This
also looks like it might be user reachable, so a user could DoS the
host by continuously calling this where pasid is not supported and fill
logs with pr_err.  Maybe dropping the pr_err is the better choice.


> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
> +		pr_err("Invalid gPASID table size %llu, host size %lu\n",
> +			pasidt_binfo->size,
> +			intel_iommu_get_pts(iommu));
> +		ret = -EINVAL;
> +		goto out;

equal is not valid?

> +	}
> +	spin_lock_irqsave(&iommu->lock, flags);
> +	context = iommu_context_addr(iommu, bus, devfn, 0);
> +	if (!context || !context_present(context)) {
> +		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
> +			__func__, bus, devfn);

Use standard PCI address format, including segment.

> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		goto out;
> +	}
> +	/* Anticipate guest to use SVM and owns the first level */
> +	ctx_lo = context[0].lo;
> +	ctx_lo |= CONTEXT_NESTE;
> +	ctx_lo |= CONTEXT_PRS;
> +	ctx_lo |= CONTEXT_PASIDE;
> +	ctx_lo &= ~CONTEXT_TT_MASK;
> +	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
> +	context[0].lo = ctx_lo;
> +
> +	/* Assign guest PASID table pointer and size */
> +	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
> +	context[1].lo = ctx_lo;
> +	/* make sure context entry is updated before flushing */
> +	wmb();
> +	did = dmar_domain->iommu_did[iommu->seq_id];
> +	iommu->flush.flush_context(iommu, did,
> +				(((u16)bus) << 8) | devfn,
> +				DMA_CCMD_MASK_NOBIT,
> +				DMA_CCMD_DEVICE_INVL);
> +	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> +	spin_unlock_irqrestore(&iommu->lock, flags);

Mildly concerned what sort of Pandora's box this opens, but I guess
we're relying on the 2nd level translation to validate and make sure
the user can only hurt themselves.

> +
> +
> +out:
> +	return ret;
> +}
> +
> +static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> +					struct device *dev)
> +{
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	u8 bus, devfn;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +	/*
> +	 * REVISIT: we might want to clear the PASID table pointer
> +	 * as part of context clear operation. Currently, it leaves
> +	 * stale data but should be ignored by hardware since PASIDE
> +	 * is clear.
> +	 */
> +	/* ATS will be reenabled when remapping is restored */
> +	pci_disable_ats(to_pci_dev(dev));
> +	domain_context_clear(iommu, dev);
> +	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
> +}
>  #endif /* CONFIG_INTEL_IOMMU_SVM */
>  
>  static const struct iommu_ops intel_iommu_ops = {
> @@ -5314,6 +5413,10 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  	.domain_free	= intel_iommu_domain_free,
>  	.attach_dev	= intel_iommu_attach_device,
>  	.detach_dev	= intel_iommu_detach_device,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> +	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +#endif
>  	.map		= intel_iommu_map,
>  	.unmap		= intel_iommu_unmap,
>  	.map_sg		= default_iommu_map_sg,
> diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
> index 187c102..c03b62a 100644
> --- a/include/linux/dma_remapping.h
> +++ b/include/linux/dma_remapping.h
> @@ -27,6 +27,7 @@
>  
>  #define CONTEXT_DINVE		(1ULL << 8)
>  #define CONTEXT_PRS		(1ULL << 9)
> +#define CONTEXT_NESTE		(1ULL << 10)
>  #define CONTEXT_PASIDE		(1ULL << 11)
>  
>  struct intel_iommu;

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
@ 2017-05-12 21:59       ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-05-12 21:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian, ashok.raj,
	jacob.jun.pan, tianyu.lan, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L

On Wed, 26 Apr 2017 18:11:59 +0800
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c   | 103 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 104 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 646756c..6d5b939 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5306,6 +5306,105 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model != INTEL_IOMMU) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;

struct pasid_table_info is expected to be provided by a user, the
opaque data structure for model == INTEL_IOMMU therefore needs to be
documented in uapi.

> +	/* check SID, if it is not correct, return */
> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;

This is a bit weird, it took me until later in the series to understand
why this is a success case.  Perhaps the device matching needs to be
standardized in pasid_table_info rather than the opaque data.
Minimally, more comments.

> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported) {
> +		pr_err("Device %d:%d.%d has no pasid support\n", bus,
> +			PCI_SLOT(devfn), PCI_FUNC(devfn));

PCI addresses should be printed in hex and include the segment.  This
also looks like it might be user reachable, so a user could DoS the
host by continuously calling this where pasid is not supported and fill
logs with pr_err.  Maybe dropping the pr_err is the better choice.


> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
> +		pr_err("Invalid gPASID table size %llu, host size %lu\n",
> +			pasidt_binfo->size,
> +			intel_iommu_get_pts(iommu));
> +		ret = -EINVAL;
> +		goto out;

equal is not valid?

> +	}
> +	spin_lock_irqsave(&iommu->lock, flags);
> +	context = iommu_context_addr(iommu, bus, devfn, 0);
> +	if (!context || !context_present(context)) {
> +		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
> +			__func__, bus, devfn);

Use standard PCI address format, including segment.

> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		goto out;
> +	}
> +	/* Anticipate guest to use SVM and owns the first level */
> +	ctx_lo = context[0].lo;
> +	ctx_lo |= CONTEXT_NESTE;
> +	ctx_lo |= CONTEXT_PRS;
> +	ctx_lo |= CONTEXT_PASIDE;
> +	ctx_lo &= ~CONTEXT_TT_MASK;
> +	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
> +	context[0].lo = ctx_lo;
> +
> +	/* Assign guest PASID table pointer and size */
> +	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
> +	context[1].lo = ctx_lo;
> +	/* make sure context entry is updated before flushing */
> +	wmb();
> +	did = dmar_domain->iommu_did[iommu->seq_id];
> +	iommu->flush.flush_context(iommu, did,
> +				(((u16)bus) << 8) | devfn,
> +				DMA_CCMD_MASK_NOBIT,
> +				DMA_CCMD_DEVICE_INVL);
> +	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> +	spin_unlock_irqrestore(&iommu->lock, flags);

Mildly concerned what sort of Pandora's box this opens, but I guess
we're relying on the 2nd level translation to validate and make sure
the user can only hurt themselves.

> +
> +
> +out:
> +	return ret;
> +}
> +
> +static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> +					struct device *dev)
> +{
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	u8 bus, devfn;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +	/*
> +	 * REVISIT: we might want to clear the PASID table pointer
> +	 * as part of context clear operation. Currently, it leaves
> +	 * stale data but should be ignored by hardware since PASIDE
> +	 * is clear.
> +	 */
> +	/* ATS will be reenabled when remapping is restored */
> +	pci_disable_ats(to_pci_dev(dev));
> +	domain_context_clear(iommu, dev);
> +	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
> +}
>  #endif /* CONFIG_INTEL_IOMMU_SVM */
>  
>  static const struct iommu_ops intel_iommu_ops = {
> @@ -5314,6 +5413,10 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  	.domain_free	= intel_iommu_domain_free,
>  	.attach_dev	= intel_iommu_attach_device,
>  	.detach_dev	= intel_iommu_detach_device,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> +	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +#endif
>  	.map		= intel_iommu_map,
>  	.unmap		= intel_iommu_unmap,
>  	.map_sg		= default_iommu_map_sg,
> diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
> index 187c102..c03b62a 100644
> --- a/include/linux/dma_remapping.h
> +++ b/include/linux/dma_remapping.h
> @@ -27,6 +27,7 @@
>  
>  #define CONTEXT_DINVE		(1ULL << 8)
>  #define CONTEXT_PRS		(1ULL << 9)
> +#define CONTEXT_NESTE		(1ULL << 10)
>  #define CONTEXT_PASIDE		(1ULL << 11)
>  
>  struct intel_iommu;

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-05-12 12:11     ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-05-14 10:12         ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.

Hi Jean,

As my cover letter mentioned, it is an open on the iommu tlb invalidate
propagation. Paste it here since it's in the cover letter for Qemu part
changes. Pls refer to the [Open] session in the following link.

http://www.spinics.net/lists/kvm/msg148798.html

I want to see if community wants to have opaque structure or not
on iommu tlb invalidate propagation. Personally, I incline to use
opaque structure. But it's better to gather the comments before
deciding it. To assist the discussion, I put the full opaque structure
here. Once community gets consensus on using opaque structure for
iommu tlb invalidate propagation, I'm glad to work with you on a
structure with partial opaque since there seems to be overlap across
arch.

> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
> 	__u32	argsz;
> 	__u32	scope;
> 	__u32	flags;
> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	__u8	data[];
> };
>
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 

Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
SMMU? So I think besides the two scope you defined, may need one more to
indicate if it's PASID Cache flushing.

> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 

For Intel platform, Drain read/write would be needed in the opaque.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-05-14 10:12         ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan

On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.

Hi Jean,

As my cover letter mentioned, it is an open on the iommu tlb invalidate
propagation. Paste it here since it's in the cover letter for Qemu part
changes. Pls refer to the [Open] session in the following link.

http://www.spinics.net/lists/kvm/msg148798.html

I want to see if community wants to have opaque structure or not
on iommu tlb invalidate propagation. Personally, I incline to use
opaque structure. But it's better to gather the comments before
deciding it. To assist the discussion, I put the full opaque structure
here. Once community gets consensus on using opaque structure for
iommu tlb invalidate propagation, I'm glad to work with you on a
structure with partial opaque since there seems to be overlap across
arch.

> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
> 	__u32	argsz;
> 	__u32	scope;
> 	__u32	flags;
> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	__u8	data[];
> };
>
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 

Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
SMMU? So I think besides the two scope you defined, may need one more to
indicate if it's PASID Cache flushing.

> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 

For Intel platform, Drain read/write would be needed in the opaque.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-05-12 21:58     ` [Qemu-devel] " Alex Williamson
@ 2017-05-14 10:55       ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, jean-philippe.brucker

On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:04 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> > +
> > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> 
> I'm kind of wondering why this isn't just a new flag bit on
> vfio_device_svm, the data structure is so similar.  Of course data
> needs to be fully specified in uapi.

Hi Alex,

For this part, it depends on using opaque structure or not. The following
link mentioned it in [Open] session.

http://www.spinics.net/lists/kvm/msg148798.html

If we pick the full opaque solution for iommu tlb invalidate propagation.
Then I may add a flag bit on vfio_device_svm and also add definition in
uapi as you suggested.

Thanks,
Yi L

> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-05-14 10:55       ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, jean-philippe.brucker

On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:04 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> > +
> > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> 
> I'm kind of wondering why this isn't just a new flag bit on
> vfio_device_svm, the data structure is so similar.  Of course data
> needs to be fully specified in uapi.

Hi Alex,

For this part, it depends on using opaque structure or not. The following
link mentioned it in [Open] session.

http://www.spinics.net/lists/kvm/msg148798.html

If we pick the full opaque solution for iommu tlb invalidate propagation.
Then I may add a flag bit on vfio_device_svm and also add definition in
uapi as you suggested.

Thanks,
Yi L

> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
@ 2017-05-14 10:56           ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:11:58 +0800
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> > +	__u8	opaque[];/* IOMMU-specific details */
> > +};
> 
> This needs to be in uapi since you're expecting a user to pass it 

yes, it is. Thx for the correction.

Thanks,
Yi L
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -159,6 +168,8 @@ struct iommu_dm_region {
> >   * @domain_get_windows: Return the number of windows for a domain
> >   * @of_xlate: add OF master IDs to iommu grouping
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @bind_pasid_table: bind pasid table pointer for guest SVM
> > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
> >   */
> >  struct iommu_ops {
> >  	bool (*capable)(enum iommu_cap);
> > @@ -200,6 +211,10 @@ struct iommu_ops {
> >  	u32 (*domain_get_windows)(struct iommu_domain *domain);
> >  
> >  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > +	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> > +				struct pasid_table_info *pasidt_binfo);
> > +	int (*unbind_pasid_table)(struct iommu_domain *domain,
> > +				struct device *dev);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
> >  			       struct device *dev);
> >  extern void iommu_detach_device(struct iommu_domain *domain,
> >  				struct device *dev);
> > +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> > +		struct device *dev, struct pasid_table_info *pasidt_binfo);
> > +extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> > +				struct device *dev);
> >  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> >  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  		     phys_addr_t paddr, size_t size, int prot);
> > @@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
> >  	return NULL;
> >  }
> >  
> > +static inline
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> > +{
> > +	return -EINVAL;
> > +}
> > +static inline
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	return -EINVAL;
> > +}
> > +
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-05-14 10:56           ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-14 10:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, jean-philippe.brucker,
	Jacob Pan

On Fri, May 12, 2017 at 03:59:14PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:11:58 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> > case in the guest:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> > 
> > As part of the proposed architecture, when a SVM capable PCI
> > device is assigned to a guest, nested mode is turned on. Guest owns the
> > first level page tables (request with PASID) and performs GVA->GPA
> > translation. Second level page tables are owned by the host for GPA->HPA
> > translation for both request with and without PASID.
> > 
> > A new IOMMU driver interface is therefore needed to perform tasks as
> > follows:
> > * Enable nested translation and appropriate translation type
> > * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> > 
> > This patch introduces new functions called iommu_(un)bind_pasid_table()
> > to IOMMU APIs. Architecture specific IOMMU function can be added later
> > to perform the specific steps for binding pasid table of assigned devices.
> > 
> > This patch also adds model definition in iommu.h. It would be used to
> > check if the bind request is from a compatible entity. e.g. a bind
> > request from an intel_iommu emulator may not be supported by an ARM SMMU
> > driver.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dbe7f65..f2da636 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> > +{
> > +	if (unlikely(!domain->ops->bind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> > +
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	if (unlikely(!domain->ops->unbind_pasid_table))
> > +		return -EINVAL;
> > +
> > +	return domain->ops->unbind_pasid_table(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0ff5111..491a011 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -131,6 +131,15 @@ struct iommu_dm_region {
> >  	int			prot;
> >  };
> >  
> > +struct pasid_table_info {
> > +	__u64	ptr;	/* PASID table ptr */
> > +	__u64	size;	/* PASID table size*/
> > +	__u32	model;	/* magic number */
> > +#define INTEL_IOMMU	(1 << 0)
> > +#define ARM_SMMU	(1 << 1)
> > +	__u8	opaque[];/* IOMMU-specific details */
> > +};
> 
> This needs to be in uapi since you're expecting a user to pass it 

yes, it is. Thx for the correction.

Thanks,
Yi L
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -159,6 +168,8 @@ struct iommu_dm_region {
> >   * @domain_get_windows: Return the number of windows for a domain
> >   * @of_xlate: add OF master IDs to iommu grouping
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @bind_pasid_table: bind pasid table pointer for guest SVM
> > + * @unbind_pasid_table: unbind pasid table pointer and restore defaults
> >   */
> >  struct iommu_ops {
> >  	bool (*capable)(enum iommu_cap);
> > @@ -200,6 +211,10 @@ struct iommu_ops {
> >  	u32 (*domain_get_windows)(struct iommu_domain *domain);
> >  
> >  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > +	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
> > +				struct pasid_table_info *pasidt_binfo);
> > +	int (*unbind_pasid_table)(struct iommu_domain *domain,
> > +				struct device *dev);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -221,6 +236,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
> >  			       struct device *dev);
> >  extern void iommu_detach_device(struct iommu_domain *domain,
> >  				struct device *dev);
> > +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> > +		struct device *dev, struct pasid_table_info *pasidt_binfo);
> > +extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> > +				struct device *dev);
> >  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> >  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  		     phys_addr_t paddr, size_t size, int prot);
> > @@ -595,6 +614,18 @@ const struct iommu_ops *iommu_get_instance(struct fwnode_handle *fwnode)
> >  	return NULL;
> >  }
> >  
> > +static inline
> > +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> > +			struct pasid_table_info *pasidt_binfo)
> > +{
> > +	return -EINVAL;
> > +}
> > +static inline
> > +int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	return -EINVAL;
> > +}
> > +
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-05-14 10:12         ` [Qemu-devel] " Liu, Yi L
@ 2017-05-15 12:14           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-15 12:14 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan

On 14/05/17 11:12, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi,
>>
>> On 26/04/17 11:12, Liu, Yi L wrote:
>>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
>>>
>>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
>>> invalidate request from guest to host.
>>>
>>> In the case of SVM virtualization on VT-d, host IOMMU driver has
>>> no knowledge of caching structure updates unless the guest
>>> invalidation activities are passed down to the host. So a new
>>> IOCTL is needed to propagate the guest cache invalidation through
>>> VFIO.
>>>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  include/uapi/linux/vfio.h | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 6b97987..50c51f8 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>>>  
>>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>>>  
>>> +/* For IOMMU TLB Invalidation Propagation */
>>> +struct vfio_iommu_tlb_invalidate {
>>> +	__u32	argsz;
>>> +	__u32	length;
>>> +	__u8	data[];
>>> +};
>>
>> We initially discussed something a little more generic than this, with
>> most info explicitly described and only pIOMMU-specific quirks and hints
>> in an opaque structure. Out of curiosity, why the change? I'm not against
>> a fully opaque structure, but there seem to be a large overlap between TLB
>> invalidations across architectures.
> 
> Hi Jean,
> 
> As my cover letter mentioned, it is an open on the iommu tlb invalidate
> propagation. Paste it here since it's in the cover letter for Qemu part
> changes. Pls refer to the [Open] session in the following link.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> I want to see if community wants to have opaque structure or not
> on iommu tlb invalidate propagation. Personally, I incline to use
> opaque structure. But it's better to gather the comments before
> deciding it. To assist the discussion, I put the full opaque structure
> here. Once community gets consensus on using opaque structure for
> iommu tlb invalidate propagation, I'm glad to work with you on a
> structure with partial opaque since there seems to be overlap across
> arch.

I see, thanks for the pointer. I'm not fan of using the pIOMMU format in
invalidations, but I understand the appeal in your case, where you can
shave off the overhead of parsing/re-assembling the pIOMMU format.

It's not suitable for generic io-pgtable, where different pIOMMUs may
offer the same page table format but different invalidation methods. I
guess I could make it work by negotiating invalidation method at bind
time, falling back to the generic format for virtio-iommu. We already have
to do some related negotiation for SVM on SMMU, where in embedded
implementations the guest doesn't need to send invalidation requests via
IOMMU queue, but can do it using CPU instructions instead.

>> For what it's worth, when prototyping the paravirtualized IOMMU I came up
>> with the following.
>>
>> (From the paravirtualized POV, the SMMU also has to swizzle endianess
>> after unpacking an opaque structure, since userspace doesn't know what's
>> in it and guest might use a different endianess. So we need to force all
>> opaque data to be e.g. little-endian.)
>>
>> struct vfio_iommu_tlb_invalidate {
>> 	__u32	argsz;
>> 	__u32	scope;
>> 	__u32	flags;
>> 	__u32	pasid;
>> 	__u64	vaddr;
>> 	__u64	size;
>> 	__u8	data[];
>> };
>>
>> Scope is a bitfields restricting the invalidation scope. By default
>> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
>> and @size are unused.
>>
>> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>> scope to the pasid described by @pasid.
>> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
>> scope to the address range described by (@vaddr, @size).
>>
>> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
>> range for *all* pasids (as well as no_pasid). Setting scope =
>> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
>> the VA range only for @pasid.
>>
> 
> Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
> SMMU? So I think besides the two scope you defined, may need one more to
> indicate if it's PASID Cache flushing.

Yes, invalidating all TLB entries associated to a PASID would be done by
setting scope = VFIO_IOMMU_INVALIDATE_PASID. In which case field @pasid is
valid, and fields (@vaddr, @size) aren't used.

Thanks,
Jean

>> Flags depend on the selected scope:
>>
>> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
>> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
>> exclusively (some architectures, e.g. SMMU, allow this)
>>
>> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
>> to invalidate all intermediate tables cached as part of the PTW for vaddr,
>> only the last-level entry (pte). This is a hint.
>>
>> I guess what's missing for Intel IOMMU and would go in @data is the
>> "global" hint (which we don't have in SMMU invalidations). Do you see
>> anything else, that the pIOMMU cannot deduce from this structure?
>>
> 
> For Intel platform, Drain read/write would be needed in the opaque.
> 
> Thanks,
> Yi L
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-05-15 12:14           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-15 12:14 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, jasowang,
	qemu-devel, kevin.tian, ashok.raj, jacob.jun.pan, tianyu.lan

On 14/05/17 11:12, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
>> Hi Yi,
>>
>> On 26/04/17 11:12, Liu, Yi L wrote:
>>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
>>>
>>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
>>> invalidate request from guest to host.
>>>
>>> In the case of SVM virtualization on VT-d, host IOMMU driver has
>>> no knowledge of caching structure updates unless the guest
>>> invalidation activities are passed down to the host. So a new
>>> IOCTL is needed to propagate the guest cache invalidation through
>>> VFIO.
>>>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  include/uapi/linux/vfio.h | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 6b97987..50c51f8 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>>>  
>>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>>>  
>>> +/* For IOMMU TLB Invalidation Propagation */
>>> +struct vfio_iommu_tlb_invalidate {
>>> +	__u32	argsz;
>>> +	__u32	length;
>>> +	__u8	data[];
>>> +};
>>
>> We initially discussed something a little more generic than this, with
>> most info explicitly described and only pIOMMU-specific quirks and hints
>> in an opaque structure. Out of curiosity, why the change? I'm not against
>> a fully opaque structure, but there seem to be a large overlap between TLB
>> invalidations across architectures.
> 
> Hi Jean,
> 
> As my cover letter mentioned, it is an open on the iommu tlb invalidate
> propagation. Paste it here since it's in the cover letter for Qemu part
> changes. Pls refer to the [Open] session in the following link.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> I want to see if community wants to have opaque structure or not
> on iommu tlb invalidate propagation. Personally, I incline to use
> opaque structure. But it's better to gather the comments before
> deciding it. To assist the discussion, I put the full opaque structure
> here. Once community gets consensus on using opaque structure for
> iommu tlb invalidate propagation, I'm glad to work with you on a
> structure with partial opaque since there seems to be overlap across
> arch.

I see, thanks for the pointer. I'm not fan of using the pIOMMU format in
invalidations, but I understand the appeal in your case, where you can
shave off the overhead of parsing/re-assembling the pIOMMU format.

It's not suitable for generic io-pgtable, where different pIOMMUs may
offer the same page table format but different invalidation methods. I
guess I could make it work by negotiating invalidation method at bind
time, falling back to the generic format for virtio-iommu. We already have
to do some related negotiation for SVM on SMMU, where in embedded
implementations the guest doesn't need to send invalidation requests via
IOMMU queue, but can do it using CPU instructions instead.

>> For what it's worth, when prototyping the paravirtualized IOMMU I came up
>> with the following.
>>
>> (From the paravirtualized POV, the SMMU also has to swizzle endianess
>> after unpacking an opaque structure, since userspace doesn't know what's
>> in it and guest might use a different endianess. So we need to force all
>> opaque data to be e.g. little-endian.)
>>
>> struct vfio_iommu_tlb_invalidate {
>> 	__u32	argsz;
>> 	__u32	scope;
>> 	__u32	flags;
>> 	__u32	pasid;
>> 	__u64	vaddr;
>> 	__u64	size;
>> 	__u8	data[];
>> };
>>
>> Scope is a bitfields restricting the invalidation scope. By default
>> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
>> and @size are unused.
>>
>> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>> scope to the pasid described by @pasid.
>> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
>> scope to the address range described by (@vaddr, @size).
>>
>> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
>> range for *all* pasids (as well as no_pasid). Setting scope =
>> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
>> the VA range only for @pasid.
>>
> 
> Besides VA range flusing, there is PASID Cache flushing on VT-d. How about
> SMMU? So I think besides the two scope you defined, may need one more to
> indicate if it's PASID Cache flushing.

Yes, invalidating all TLB entries associated to a PASID would be done by
setting scope = VFIO_IOMMU_INVALIDATE_PASID. In which case field @pasid is
valid, and fields (@vaddr, @size) aren't used.

Thanks,
Jean

>> Flags depend on the selected scope:
>>
>> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
>> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
>> exclusively (some architectures, e.g. SMMU, allow this)
>>
>> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
>> to invalidate all intermediate tables cached as part of the PTW for vaddr,
>> only the last-level entry (pte). This is a hint.
>>
>> I guess what's missing for Intel IOMMU and would go in @data is the
>> "global" hint (which we don't have in SMMU invalidations). Do you see
>> anything else, that the pIOMMU cannot deduce from this structure?
>>
> 
> For Intel platform, Drain read/write would be needed in the opaque.
> 
> Thanks,
> Yi L
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
  2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
@ 2017-05-15 13:14           ` jacob pan
  -1 siblings, 0 replies; 116+ messages in thread
From: jacob pan @ 2017-05-15 13:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, 12 May 2017 15:59:29 -0600
Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> > +	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
> > +		pr_err("Invalid gPASID table size %llu, host size
> > %lu\n",
> > +			pasidt_binfo->size,
> > +			intel_iommu_get_pts(iommu));
> > +		ret = -EINVAL;
> > +		goto out;  
> 
> equal is not valid?

you are right, equal is valid. I was thinking of shared PASID space
between guest and host but that is not the case here.

The rest of your comments are taken too, thanks for the review.

Jacob

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function
@ 2017-05-15 13:14           ` jacob pan
  0 siblings, 0 replies; 116+ messages in thread
From: jacob pan @ 2017-05-15 13:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, tianyu.lan, jean-philippe.brucker, Jacob Pan, Liu,
	Yi L, jacob.jun.pan

On Fri, 12 May 2017 15:59:29 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> > +	if (pasidt_binfo->size >= intel_iommu_get_pts(iommu)) {
> > +		pr_err("Invalid gPASID table size %llu, host size
> > %lu\n",
> > +			pasidt_binfo->size,
> > +			intel_iommu_get_pts(iommu));
> > +		ret = -EINVAL;
> > +		goto out;  
> 
> equal is not valid?

you are right, equal is valid. I was thinking of shared PASID space
between guest and host but that is not the case here.

The rest of your comments are taken too, thanks for the review.

Jacob

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
  2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
@ 2017-05-17 10:23           ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:00 +0800
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 

Hi Alex,

Pls refer to the open I mentioned in this email, I need your comments
on it to prepare the formal patchset for SVM virtualization. Thx.

> > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > When a SVM capable device is assigned to a guest, the first level page
> > tables are owned by the guest and the guest PASID table pointer is
> > linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates unless
> > the guest invalidation activities are passed down to the host. The
> > primary usage is derived from emulated IOMMU in the guest, where QEMU
> > can trap invalidation activities before pass them down the
> > host/physical IOMMU. There are IOMMU architectural specific actions
> > need to be taken which requires the generic APIs introduced in this
> > patch to have opaque data in the tlb_invalidate_info argument.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/iommu/iommu.c | 13 +++++++++++++
> >  include/linux/iommu.h | 16 ++++++++++++++++
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index f2da636..ca7cff2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >  
> > +int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	int ret = 0;
> > +
> > +	if (unlikely(domain->ops->do_invalidate == NULL))
> > +		return -ENODEV;
> > +
> > +	ret = domain->ops->do_invalidate(domain, dev, inv_info);
> > +	return ret;
> 
> nit, ret is unnecessary.

yes, would modify it. Thx.
 
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 491a011..a48e3b75 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -140,6 +140,11 @@ struct pasid_table_info {
> >  	__u8	opaque[];/* IOMMU-specific details */
> >  };
> >  
> > +struct tlb_invalidate_info {
> > +	__u32	model;
> > +	__u8	opaque[];
> > +};
> 
> I'm wondering if 'model' is really necessary here, shouldn't this
> function only be called if a bind_pasid_table() succeeded, and then the
> model would be set at that time?

For this model, I'm thinking about another potential usage which
is from Tianyu's idea to use tlb_invalidate_info to pass invalidations
for iova related mappings. In such case, there would be no bind_pasid_table()
before it, so a model check would be needed. But I may remove it since this
patchset is focusing on SVM.

Here, I have an open to check with you. I defined the tlb_invalidate_info
with full opaque data. The opaque would include the invalidate info for
different vendors. But we have two choices for the tlb_invalidate_info
definition.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
           on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
           question since different vendor may have vendor specific
           invalidation info. This would make it difficult to have vendor
           agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
           The generic structure would be a hyper-set of all possible
           invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. Jean from ARM has gave some comments on it and inclined to the opaque
way with generic part defined explicitly. Jean's reply is in the link below.

http://www.spinics.net/lists/kvm/msg149884.html

I'd like to see your comments on it before moving forward. I'm fine with
Jean's idea. For VT-d, I may define it as "generic part" + "raw data".

Thanks,
Yi L

> This also needs to be uapi since you're expecting a user to provide it
> to vfio.  The opaque data needs to be fully specified (relative to
> uapi) per model.
> 

would do it as you pointed.
 
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -215,6 +220,8 @@ struct iommu_ops {
> >  				struct pasid_table_info *pasidt_binfo);
> >  	int (*unbind_pasid_table)(struct iommu_domain *domain,
> >  				struct device *dev);
> > +	int (*do_invalidate)(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> >  		struct device *dev, struct pasid_table_info *pasidt_binfo);
> >  extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> >  				struct device *dev);
> > +extern int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info);
> > +
> >  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> >  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  		     phys_addr_t paddr, size_t size, int prot);
> > @@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >  	return -EINVAL;
> >  }
> >  
> > +static inline int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	return -EINVAL;
> > +}
> > +
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function
@ 2017-05-17 10:23           ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, jean-philippe.brucker,
	Jacob Pan

On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:00 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 

Hi Alex,

Pls refer to the open I mentioned in this email, I need your comments
on it to prepare the formal patchset for SVM virtualization. Thx.

> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > When a SVM capable device is assigned to a guest, the first level page
> > tables are owned by the guest and the guest PASID table pointer is
> > linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates unless
> > the guest invalidation activities are passed down to the host. The
> > primary usage is derived from emulated IOMMU in the guest, where QEMU
> > can trap invalidation activities before pass them down the
> > host/physical IOMMU. There are IOMMU architectural specific actions
> > need to be taken which requires the generic APIs introduced in this
> > patch to have opaque data in the tlb_invalidate_info argument.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/iommu.c | 13 +++++++++++++
> >  include/linux/iommu.h | 16 ++++++++++++++++
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index f2da636..ca7cff2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >  
> > +int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	int ret = 0;
> > +
> > +	if (unlikely(domain->ops->do_invalidate == NULL))
> > +		return -ENODEV;
> > +
> > +	ret = domain->ops->do_invalidate(domain, dev, inv_info);
> > +	return ret;
> 
> nit, ret is unnecessary.

yes, would modify it. Thx.
 
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 491a011..a48e3b75 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -140,6 +140,11 @@ struct pasid_table_info {
> >  	__u8	opaque[];/* IOMMU-specific details */
> >  };
> >  
> > +struct tlb_invalidate_info {
> > +	__u32	model;
> > +	__u8	opaque[];
> > +};
> 
> I'm wondering if 'model' is really necessary here, shouldn't this
> function only be called if a bind_pasid_table() succeeded, and then the
> model would be set at that time?

For this model, I'm thinking about another potential usage which
is from Tianyu's idea to use tlb_invalidate_info to pass invalidations
for iova related mappings. In such case, there would be no bind_pasid_table()
before it, so a model check would be needed. But I may remove it since this
patchset is focusing on SVM.

Here, I have an open to check with you. I defined the tlb_invalidate_info
with full opaque data. The opaque would include the invalidate info for
different vendors. But we have two choices for the tlb_invalidate_info
definition.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
           on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
           question since different vendor may have vendor specific
           invalidation info. This would make it difficult to have vendor
           agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
           The generic structure would be a hyper-set of all possible
           invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. Jean from ARM has gave some comments on it and inclined to the opaque
way with generic part defined explicitly. Jean's reply is in the link below.

http://www.spinics.net/lists/kvm/msg149884.html

I'd like to see your comments on it before moving forward. I'm fine with
Jean's idea. For VT-d, I may define it as "generic part" + "raw data".

Thanks,
Yi L

> This also needs to be uapi since you're expecting a user to provide it
> to vfio.  The opaque data needs to be fully specified (relative to
> uapi) per model.
> 

would do it as you pointed.
 
> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**
> > @@ -215,6 +220,8 @@ struct iommu_ops {
> >  				struct pasid_table_info *pasidt_binfo);
> >  	int (*unbind_pasid_table)(struct iommu_domain *domain,
> >  				struct device *dev);
> > +	int (*do_invalidate)(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -240,6 +247,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> >  		struct device *dev, struct pasid_table_info *pasidt_binfo);
> >  extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
> >  				struct device *dev);
> > +extern int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info);
> > +
> >  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> >  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  		     phys_addr_t paddr, size_t size, int prot);
> > @@ -626,6 +636,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
> >  	return -EINVAL;
> >  }
> >  
> > +static inline int iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	return -EINVAL;
> > +}
> > +
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
  2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
@ 2017-05-17 10:24           ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:01 +0800
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > This patch adds Intel VT-d specific function to implement
> > iommu_do_invalidate API.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the guest
> > to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/intel-iommu.h | 11 +++++++++++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6d5b939..0b098ad 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
> >  	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> >  }
> >  
> > +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	int ret = 0;
> > +	struct intel_iommu *iommu;
> > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +	struct intel_invalidate_data *inv_data;
> > +	struct qi_desc *qi;
> > +	u16 did;
> > +	u8 bus, devfn;
> > +
> > +	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> > +		return -EINVAL;
> > +
> > +	iommu = device_to_iommu(dev, &bus, &devfn);
> > +	if (!iommu)
> > +		return -ENODEV;
> > +
> > +	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> > +
> > +	/* check SID */
> > +	if (PCI_DEVID(bus, devfn) != inv_data->sid)
> > +		return 0;
> > +
> > +	qi = &inv_data->inv_desc;
> > +
> > +	switch (qi->low & QI_TYPE_MASK) {
> > +	case QI_DIOTLB_TYPE:
> > +	case QI_DEIOTLB_TYPE:
> > +		/* for device IOTLB, we just let it pass through */
> > +		break;
> > +	default:
> > +		did = dmar_domain->iommu_did[iommu->seq_id];
> > +		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> > +		break;
> > +	}
> > +
> > +	ret = qi_submit_sync(qi, iommu);
> > +
> > +	return ret;
> 
> nit, ret variable is unnecessary.

yes, would remove it.
 
> > +}
> > +
> >  static int intel_iommu_map(struct iommu_domain *domain,
> >  			   unsigned long iova, phys_addr_t hpa,
> >  			   size_t size, int iommu_prot)
> > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> >  	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> >  	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> > +	.do_invalidate		= intel_iommu_do_invalidate,
> >  #endif
> >  	.map		= intel_iommu_map,
> >  	.unmap		= intel_iommu_unmap,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index ac04f28..9d6562c 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -29,6 +29,7 @@
> >  #include <linux/dma_remapping.h>
> >  #include <linux/mmu_notifier.h>
> >  #include <linux/list.h>
> > +#include <linux/bitops.h>
> >  #include <asm/cacheflush.h>
> >  #include <asm/iommu.h>
> >  
> > @@ -271,6 +272,10 @@ enum {
> >  #define QI_PGRP_RESP_TYPE	0x9
> >  #define QI_PSTRM_RESP_TYPE	0xa
> >  
> > +#define QI_DID(did)		(((u64)did & 0xffff) << 16)
> > +#define QI_DID_MASK		GENMASK(31, 16)
> > +#define QI_TYPE_MASK		GENMASK(3, 0)
> > +
> >  #define QI_IEC_SELECTIVE	(((u64)1) << 4)
> >  #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
> >  #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
> > @@ -529,6 +534,12 @@ struct intel_svm {
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> >  #endif
> >  
> > +struct intel_invalidate_data {
> > +	u16 sid;
> > +	u32 pasid;
> > +	struct qi_desc inv_desc;
> > +};
> 
> This needs to be uapi since the vfio user is expected to create it, so
> we need a uapi version of qi_desc too.
>

yes, would do it.

Thx,
Yi L
 
> > +
> >  extern const struct attribute_group *intel_iommu_groups[];
> >  extern void intel_iommu_debugfs_init(void);
> >  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function
@ 2017-05-17 10:24           ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, jean-philippe.brucker,
	Jacob Pan

On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:01 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > This patch adds Intel VT-d specific function to implement
> > iommu_do_invalidate API.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the guest
> > to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/intel-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/intel-iommu.h | 11 +++++++++++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6d5b939..0b098ad 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
> >  	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> >  }
> >  
> > +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +	int ret = 0;
> > +	struct intel_iommu *iommu;
> > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +	struct intel_invalidate_data *inv_data;
> > +	struct qi_desc *qi;
> > +	u16 did;
> > +	u8 bus, devfn;
> > +
> > +	if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> > +		return -EINVAL;
> > +
> > +	iommu = device_to_iommu(dev, &bus, &devfn);
> > +	if (!iommu)
> > +		return -ENODEV;
> > +
> > +	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> > +
> > +	/* check SID */
> > +	if (PCI_DEVID(bus, devfn) != inv_data->sid)
> > +		return 0;
> > +
> > +	qi = &inv_data->inv_desc;
> > +
> > +	switch (qi->low & QI_TYPE_MASK) {
> > +	case QI_DIOTLB_TYPE:
> > +	case QI_DEIOTLB_TYPE:
> > +		/* for device IOTLB, we just let it pass through */
> > +		break;
> > +	default:
> > +		did = dmar_domain->iommu_did[iommu->seq_id];
> > +		set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> > +		break;
> > +	}
> > +
> > +	ret = qi_submit_sync(qi, iommu);
> > +
> > +	return ret;
> 
> nit, ret variable is unnecessary.

yes, would remove it.
 
> > +}
> > +
> >  static int intel_iommu_map(struct iommu_domain *domain,
> >  			   unsigned long iova, phys_addr_t hpa,
> >  			   size_t size, int iommu_prot)
> > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> >  	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> >  	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> > +	.do_invalidate		= intel_iommu_do_invalidate,
> >  #endif
> >  	.map		= intel_iommu_map,
> >  	.unmap		= intel_iommu_unmap,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index ac04f28..9d6562c 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -29,6 +29,7 @@
> >  #include <linux/dma_remapping.h>
> >  #include <linux/mmu_notifier.h>
> >  #include <linux/list.h>
> > +#include <linux/bitops.h>
> >  #include <asm/cacheflush.h>
> >  #include <asm/iommu.h>
> >  
> > @@ -271,6 +272,10 @@ enum {
> >  #define QI_PGRP_RESP_TYPE	0x9
> >  #define QI_PSTRM_RESP_TYPE	0xa
> >  
> > +#define QI_DID(did)		(((u64)did & 0xffff) << 16)
> > +#define QI_DID_MASK		GENMASK(31, 16)
> > +#define QI_TYPE_MASK		GENMASK(3, 0)
> > +
> >  #define QI_IEC_SELECTIVE	(((u64)1) << 4)
> >  #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
> >  #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
> > @@ -529,6 +534,12 @@ struct intel_svm {
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> >  #endif
> >  
> > +struct intel_invalidate_data {
> > +	u16 sid;
> > +	u32 pasid;
> > +	struct qi_desc inv_desc;
> > +};
> 
> This needs to be uapi since the vfio user is expected to create it, so
> we need a uapi version of qi_desc too.
>

yes, would do it.

Thx,
Yi L
 
> > +
> >  extern const struct attribute_group *intel_iommu_groups[];
> >  extern void intel_iommu_debugfs_init(void);
> >  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-05-12 21:58       ` [Qemu-devel] " Alex Williamson
@ 2017-05-17 10:27           ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:02 +0800
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  include/uapi/linux/vfio.h | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +	__u32	argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> > +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > +	__u32	flags;
> > +	__u32	length;
> > +	__u8	data[];
> 
> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
> pasid_table_info?  So at a minimum this is a union including struct
> pasid_table_info.  Furthermore how does a user learn what the opaque
> data in struct pasid_table_info is without looking at the code?  A user
> API needs to be clear and documented, not opaque and variable.  We
> should also have references to the hardware spec for an Intel or ARM
> PASID table in uapi.  flags should be defined as they're used, let's
> not reserve them with the expectation of future use.
> 

Agree. would add description accordingly. For the flags, I would remove
the last two as I wouldn't use. I think Jean would add them in his/her
patchset. Anyhow, one of us need to do merge on the flags.

Thanks,
Yi L

> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> > +				VFIO_SVM_BIND_PASID | \
> > +				VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-05-17 10:27           ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-17 10:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, tianyu.lan, kevin.tian, ashok.raj, kvm,
	jean-philippe.brucker, jasowang, qemu-devel, peterx, iommu,
	jacob.jun.pan

On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:02 +0800
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +	__u32	argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
> > +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > +	__u32	flags;
> > +	__u32	length;
> > +	__u8	data[];
> 
> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
> pasid_table_info?  So at a minimum this is a union including struct
> pasid_table_info.  Furthermore how does a user learn what the opaque
> data in struct pasid_table_info is without looking at the code?  A user
> API needs to be clear and documented, not opaque and variable.  We
> should also have references to the hardware spec for an Intel or ARM
> PASID table in uapi.  flags should be defined as they're used, let's
> not reserve them with the expectation of future use.
> 

Agree. would add description accordingly. For the flags, I would remove
the last two as I wouldn't use. I think Jean would add them in his/her
patchset. Anyhow, one of us need to do merge on the flags.

Thanks,
Yi L

> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
> > +				VFIO_SVM_BIND_PASID | \
> > +				VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
  2017-05-17 10:27           ` Liu, Yi L
@ 2017-05-18 11:29               ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-18 11:29 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 17/05/17 11:27, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
>> On Wed, 26 Apr 2017 18:12:02 +0800
>> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>>>  
>>> +/* IOCTL for Shared Virtual Memory Bind */
>>> +struct vfio_device_svm {
>>> +	__u32	argsz;
>>> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
>>> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
>>> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
>>> +	__u32	flags;
>>> +	__u32	length;
>>> +	__u8	data[];
>>
>> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
>> pasid_table_info?  So at a minimum this is a union including struct
>> pasid_table_info.  Furthermore how does a user learn what the opaque
>> data in struct pasid_table_info is without looking at the code?  A user
>> API needs to be clear and documented, not opaque and variable.  We
>> should also have references to the hardware spec for an Intel or ARM
>> PASID table in uapi.  flags should be defined as they're used, let's
>> not reserve them with the expectation of future use.
>>
> 
> Agree. would add description accordingly. For the flags, I would remove
> the last two as I wouldn't use. I think Jean would add them in his/her
> patchset. Anyhow, one of us need to do merge on the flags.

Yes, I can add the VFIO_SVM_BIND_PASID (or rather _TASK) flag as (1 << 1)
in my series if it helps the merge. The PGTABLE flag is for another series
which I don't plan to send out anytime soon, since there already is enough
pending work on this.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation
@ 2017-05-18 11:29               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-18 11:29 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: Liu, Yi L, tianyu.lan, kevin.tian, ashok.raj, kvm, jasowang,
	qemu-devel, peterx, iommu, jacob.jun.pan

On 17/05/17 11:27, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
>> On Wed, 26 Apr 2017 18:12:02 +0800
>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>>>  
>>> +/* IOCTL for Shared Virtual Memory Bind */
>>> +struct vfio_device_svm {
>>> +	__u32	argsz;
>>> +#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
>>> +#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
>>> +#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
>>> +	__u32	flags;
>>> +	__u32	length;
>>> +	__u8	data[];
>>
>> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
>> pasid_table_info?  So at a minimum this is a union including struct
>> pasid_table_info.  Furthermore how does a user learn what the opaque
>> data in struct pasid_table_info is without looking at the code?  A user
>> API needs to be clear and documented, not opaque and variable.  We
>> should also have references to the hardware spec for an Intel or ARM
>> PASID table in uapi.  flags should be defined as they're used, let's
>> not reserve them with the expectation of future use.
>>
> 
> Agree. would add description accordingly. For the flags, I would remove
> the last two as I wouldn't use. I think Jean would add them in his/her
> patchset. Anyhow, one of us need to do merge on the flags.

Yes, I can add the VFIO_SVM_BIND_PASID (or rather _TASK) flag as (1 << 1)
in my series if it helps the merge. The PGTABLE flag is for another series
which I don't plan to send out anytime soon, since there already is enough
pending work on this.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-04-28 12:51           ` Jean-Philippe Brucker
@ 2017-05-23  7:50               ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-23  7:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
> On 28/04/17 10:04, Liu, Yi L wrote:
Hi Jean,

Sorry for the delay response. Still have some follow-up comments on
per-device or per-group. Pls refer to comments inline.

> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> >>> +			struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> > 
> > Virtual address space is not tied to protection domain as I/O virtual address
> > space does. Is it really necessary to affect all the devices in this group.
> > Or it is just for consistence?
> 
> It's mostly about consistency, and also avoid hiding implicit behavior in
> the IOMMU driver. I have the following example, described using group and
> domain structures from the IOMMU API:
>                  ____________________
>                 |IOMMU  ____________ |
>                 |      |DOM  ______ ||
>                 |      |    |GRP   |||     bind
>                 |      |    |    A<-----------------Task 1
>                 |      |    |    B |||
>                 |      |    |______|||
>                 |      |     ______ ||
>                 |      |    |GRP   |||
>                 |      |    |    C |||
>                 |      |    |______|||
>                 |      |____________||
>                 |       ____________ |
>                 |      |DOM  ______ ||
>                 |      |    |GRP   |||
>                 |      |    |    D |||
>                 |      |    |______|||
>                 |      |____________||
>                 |____________________|
> 
> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
> to some hardware limitation (in the bus, the device or the IOMMU), B can
> see all DMA transactions issued by A. A and B are therefore in the same
> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
> own group.
> 
> (As far as I know, in the SVM world at the moment, devices are neatly
> integrated and there is no need for putting multiple devices in the same
> IOMMU group, but I don't think we should expect all future SVM systems to
> be well-behaved.)
>
> So when a user binds Task 1 to device A, it is *implicitly* giving device
> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
> A from B, PASID or not. B could access the same address space as A, even
> if you don't call bind again to explicitly attach the PASID table to B.
> 
> If the bind is done with device as argument, maybe users will believe that
> using PASIDs provides an additional level of isolation within a group,
> when it really doesn't. That's why I'm inclined to have the whole bind API
> be on groups rather than devices, if only for clarity.

This may depend on how the user understand the isolation. I think different
PASID does mean different address space. From this perspective, it does look
like isolation.

> But I don't know, maybe a comment explaining the above would be sufficient.
> 
> To be frank my comment about group versus device is partly to make sure
> that I grasp the various concepts correctly and that we're on the same
> page. Doing the bind on groups is less significant in your case, for PASID
> table binding, because VFIO already takes care of IOMMU group properly. In
> my case I expect DRM, network, DMA drivers to use the API as well for
> binding tasks, and I don't want to introduce ambiguity in the API that
> would lead to security holes later.

For this part, would you provide more detail about why it would be more
significant to bind on group level in your case? I think we need strong
reason to support it. Currently, the other map_page APIs are passing
device as argument. Would it also be recommended to use group as argument?

Thanks,
Yi L

> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> > 
> > In my understanding, it is moving the for_each_dev loop into iommu driver?
> > Is it?
> 
> Yes, that's what I meant
> 
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> > 
> > For PASID table binding from guest, I think it'd better to be per-device op
> > since the bind operation wants to modify the host context entry. But we may
> > still share the API and do things differently in iommu driver.
> 
> Sure, as said above the use cases for PASID table and single PASID binding
> are different, sharing the API is not strictly necessary.
> 
> > For invalidation, I think it'd better to be per-group. Actually, with guest
> > IOMMU exists, there is only one group in a domain on Intel platform. Do it for
> > each device is not expected. How about it on ARM?
> 
> In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
> domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
> multiple groups in the same container to the same domain when possible.
> 
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> > 
> > Do you mean the bind task from userspace driver? I guess you're trying to do
> > different types of binding request in a single svm_bind API?
> > 
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> >> iommu_group, it should be made more explicit to users, so they don't
> >> assume that devices within a domain are isolated from each others with
> >> regard to PASID DMA.
> > 
> > Is the isolation you mentioned means forbidding to do PASID DMA to the same
> > virtual address space when the device comes from different domain?
> 
> In the above example, devices A, B and C are in the same IOMMU domain
> (because, for instance, user put the two groups in the same VFIO
> container.) Then in the SMMUv3 driver they would all share the same PASID
> table. A, B and C can access Task 1 with the PASID obtained during the
> depicted bind. They don't need to call bind again for device C, though it
> would be good practice.
> 
> But D is in a different domain, so unless you also call bind on Task 1 for
> device D, there is no way that D can access Task 1.
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-05-23  7:50               ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-05-23  7:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, Jacob Pan, ashok.raj, jasowang, qemu-devel,
	jacob.jun.pan

On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
> On 28/04/17 10:04, Liu, Yi L wrote:
Hi Jean,

Sorry for the delay response. Still have some follow-up comments on
per-device or per-group. Pls refer to comments inline.

> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >>>  include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> >>> +			struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> > 
> > Virtual address space is not tied to protection domain as I/O virtual address
> > space does. Is it really necessary to affect all the devices in this group.
> > Or it is just for consistence?
> 
> It's mostly about consistency, and also avoid hiding implicit behavior in
> the IOMMU driver. I have the following example, described using group and
> domain structures from the IOMMU API:
>                  ____________________
>                 |IOMMU  ____________ |
>                 |      |DOM  ______ ||
>                 |      |    |GRP   |||     bind
>                 |      |    |    A<-----------------Task 1
>                 |      |    |    B |||
>                 |      |    |______|||
>                 |      |     ______ ||
>                 |      |    |GRP   |||
>                 |      |    |    C |||
>                 |      |    |______|||
>                 |      |____________||
>                 |       ____________ |
>                 |      |DOM  ______ ||
>                 |      |    |GRP   |||
>                 |      |    |    D |||
>                 |      |    |______|||
>                 |      |____________||
>                 |____________________|
> 
> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
> to some hardware limitation (in the bus, the device or the IOMMU), B can
> see all DMA transactions issued by A. A and B are therefore in the same
> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
> own group.
> 
> (As far as I know, in the SVM world at the moment, devices are neatly
> integrated and there is no need for putting multiple devices in the same
> IOMMU group, but I don't think we should expect all future SVM systems to
> be well-behaved.)
>
> So when a user binds Task 1 to device A, it is *implicitly* giving device
> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
> A from B, PASID or not. B could access the same address space as A, even
> if you don't call bind again to explicitly attach the PASID table to B.
> 
> If the bind is done with device as argument, maybe users will believe that
> using PASIDs provides an additional level of isolation within a group,
> when it really doesn't. That's why I'm inclined to have the whole bind API
> be on groups rather than devices, if only for clarity.

This may depend on how the user understand the isolation. I think different
PASID does mean different address space. From this perspective, it does look
like isolation.

> But I don't know, maybe a comment explaining the above would be sufficient.
> 
> To be frank my comment about group versus device is partly to make sure
> that I grasp the various concepts correctly and that we're on the same
> page. Doing the bind on groups is less significant in your case, for PASID
> table binding, because VFIO already takes care of IOMMU group properly. In
> my case I expect DRM, network, DMA drivers to use the API as well for
> binding tasks, and I don't want to introduce ambiguity in the API that
> would lead to security holes later.

For this part, would you provide more detail about why it would be more
significant to bind on group level in your case? I think we need strong
reason to support it. Currently, the other map_page APIs are passing
device as argument. Would it also be recommended to use group as argument?

Thanks,
Yi L

> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> > 
> > In my understanding, it is moving the for_each_dev loop into iommu driver?
> > Is it?
> 
> Yes, that's what I meant
> 
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> > 
> > For PASID table binding from guest, I think it'd better to be per-device op
> > since the bind operation wants to modify the host context entry. But we may
> > still share the API and do things differently in iommu driver.
> 
> Sure, as said above the use cases for PASID table and single PASID binding
> are different, sharing the API is not strictly necessary.
> 
> > For invalidation, I think it'd better to be per-group. Actually, with guest
> > IOMMU exists, there is only one group in a domain on Intel platform. Do it for
> > each device is not expected. How about it on ARM?
> 
> In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
> domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
> multiple groups in the same container to the same domain when possible.
> 
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> > 
> > Do you mean the bind task from userspace driver? I guess you're trying to do
> > different types of binding request in a single svm_bind API?
> > 
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> >> iommu_group, it should be made more explicit to users, so they don't
> >> assume that devices within a domain are isolated from each others with
> >> regard to PASID DMA.
> > 
> > Is the isolation you mentioned means forbidding to do PASID DMA to the same
> > virtual address space when the device comes from different domain?
> 
> In the above example, devices A, B and C are in the same IOMMU domain
> (because, for instance, user put the two groups in the same VFIO
> container.) Then in the SMMUv3 driver they would all share the same PASID
> table. A, B and C can access Task 1 with the PASID obtained during the
> depicted bind. They don't need to call bind again for device C, though it
> would be good practice.
> 
> But D is in a different domain, so unless you also call bind on Task 1 for
> device D, there is no way that D can access Task 1.
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
  2017-05-23  7:50               ` Liu, Yi L
@ 2017-05-25 12:33                 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-25 12:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 23/05/17 08:50, Liu, Yi L wrote:
> On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
[...]
>>>>
>>>> For the next version of my SVM series, I was thinking of passing group
>>>> instead of device to iommu_bind. Since all devices in a group are expected
>>>> to share the same mappings (whether they want it or not), users will have
>>>
>>> Virtual address space is not tied to protection domain as I/O virtual address
>>> space does. Is it really necessary to affect all the devices in this group.
>>> Or it is just for consistence?
>>
>> It's mostly about consistency, and also avoid hiding implicit behavior in
>> the IOMMU driver. I have the following example, described using group and
>> domain structures from the IOMMU API:
>>                  ____________________
>>                 |IOMMU  ____________ |
>>                 |      |DOM  ______ ||
>>                 |      |    |GRP   |||     bind
>>                 |      |    |    A<-----------------Task 1
>>                 |      |    |    B |||
>>                 |      |    |______|||
>>                 |      |     ______ ||
>>                 |      |    |GRP   |||
>>                 |      |    |    C |||
>>                 |      |    |______|||
>>                 |      |____________||
>>                 |       ____________ |
>>                 |      |DOM  ______ ||
>>                 |      |    |GRP   |||
>>                 |      |    |    D |||
>>                 |      |    |______|||
>>                 |      |____________||
>>                 |____________________|
>>
>> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
>> to some hardware limitation (in the bus, the device or the IOMMU), B can
>> see all DMA transactions issued by A. A and B are therefore in the same
>> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
>> own group.
>>
>> (As far as I know, in the SVM world at the moment, devices are neatly
>> integrated and there is no need for putting multiple devices in the same
>> IOMMU group, but I don't think we should expect all future SVM systems to
>> be well-behaved.)
>>
>> So when a user binds Task 1 to device A, it is *implicitly* giving device
>> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
>> A from B, PASID or not. B could access the same address space as A, even
>> if you don't call bind again to explicitly attach the PASID table to B.
>>
>> If the bind is done with device as argument, maybe users will believe that
>> using PASIDs provides an additional level of isolation within a group,
>> when it really doesn't. That's why I'm inclined to have the whole bind API
>> be on groups rather than devices, if only for clarity.
> 
> This may depend on how the user understand the isolation. I think different
> PASID does mean different address space. From this perspective, it does look
> like isolation.

Yes, and it isn't isolation. Not at device granularity, that is. IOMMU has
the concept of group because sometimes the hardware simply cannot isolate
devices. Different PASIDs does mean different address spaces, but two
devices in the same group may be able to access each other's address
spaces, regardless of the presence of a PASID.

To illustrate the problem with PASIDs, let's say that for whatever reason
(e.g. lack of ACS Source Validation in a PCI switch), device B (0x0100)
can spoof device A's RID (0x0200). Therefore we put A and B in the same
IOMMU group.

User binds Task 1 to device A and Task 2 to device B. They use PASIDs X
and Y, so user thinks that they are isolated. But given the physical
properties of the system, device B can pretend it is device A, and access
the whole address space of Task 1 by sending transactions with RID 0x0200
and PASID X. So user effectively created a backdoor between tasks 1 and 2
without knowing it, and using PASIDs didn't add any protection.

>> But I don't know, maybe a comment explaining the above would be sufficient.
>>
>> To be frank my comment about group versus device is partly to make sure
>> that I grasp the various concepts correctly and that we're on the same
>> page. Doing the bind on groups is less significant in your case, for PASID
>> table binding, because VFIO already takes care of IOMMU group properly. In
>> my case I expect DRM, network, DMA drivers to use the API as well for
>> binding tasks, and I don't want to introduce ambiguity in the API that
>> would lead to security holes later.
> 
> For this part, would you provide more detail about why it would be more
> significant to bind on group level in your case? I think we need strong
> reason to support it. Currently, the other map_page APIs are passing
> device as argument. Would it also be recommended to use group as argument?

Well I'm only concerned about the API we're introducing at the moment, I'm
not suggesting we change existing ones. Because PASID is a new concept and
is currently unregulated, it would be good for new users to understand
what kind of isolation they are getting from it. And it is more important
than previous APIs because SVM's main objective is to simplify userspace
programming model, and therefore bring e.g. GPU programming to users that
will be more naive with regard to hardware properties and limitations.

I'm thinking for instance about GPU drivers using the bind API to provide
OpenCL SVM to userspace. If the person writing the driver has to pass
IOMMU groups instead of devices, they will have less chance to fall into
the trap described above. They would have to follow the VFIO model, and
propagate the concept of IOMMU groups all the way to userspace.

As I said, maybe we can just add a comment warning future users about the
limitations of PASID isolation and that will be enough, I really don't
know what's best. Since VFIO will likely stay the only user of PASID
tables binding and handles IOMMU groups well, I think it boils down to
stylistic decision in your case.

Thanks,
Jean

>>>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>>>> might be simpler to let the IOMMU core take the group lock and do
>>>> group->domain->ops->bind_task(dev...) for each device. The question also
>>>> holds for iommu_do_invalidate in patch 3/8.
>>>
>>> In my understanding, it is moving the for_each_dev loop into iommu driver?
>>> Is it?
>>
>> Yes, that's what I meant
>>
>>>> This way the prototypes would be:
>>>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>>>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>>>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>>
>>> For PASID table binding from guest, I think it'd better to be per-device op
>>> since the bind operation wants to modify the host context entry. But we may
>>> still share the API and do things differently in iommu driver.
>>
>> Sure, as said above the use cases for PASID table and single PASID binding
>> are different, sharing the API is not strictly necessary.
>>
>>> For invalidation, I think it'd better to be per-group. Actually, with guest
>>> IOMMU exists, there is only one group in a domain on Intel platform. Do it for
>>> each device is not expected. How about it on ARM?
>>
>> In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
>> domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
>> multiple groups in the same container to the same domain when possible.
>>
>>>> For PASID table binding it might not matter much, as VFIO will most likely
>>>> be the only user. But task binding will be called by device drivers, which
>>>> by now should be encouraged to do things at iommu_group granularity.
>>>> Alternatively it could be done implicitly like in iommu_attach_device,
>>>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
>>>
>>> Do you mean the bind task from userspace driver? I guess you're trying to do
>>> different types of binding request in a single svm_bind API?
>>>
>>>>
>>>> Extending this reasoning, since groups in a domain are also supposed to
>>>> have the same mappings, then similarly to map/unmap,
>>>> bind/unbind/invalidate should really be done with an iommu_domain (and
>>>> nothing else) as target argument. However this requires the IOMMU core to
>>>> keep a group list in each domain, which might complicate things a little
>>>> too much.
>>>>
>>>> But "all devices in a domain share the same PASID table" is the paradigm
>>>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>>>> iommu_group, it should be made more explicit to users, so they don't
>>>> assume that devices within a domain are isolated from each others with
>>>> regard to PASID DMA.
>>>
>>> Is the isolation you mentioned means forbidding to do PASID DMA to the same
>>> virtual address space when the device comes from different domain?
>>
>> In the above example, devices A, B and C are in the same IOMMU domain
>> (because, for instance, user put the two groups in the same VFIO
>> container.) Then in the SMMUv3 driver they would all share the same PASID
>> table. A, B and C can access Task 1 with the PASID obtained during the
>> depicted bind. They don't need to call bind again for device C, though it
>> would be good practice.
>>
>> But D is in a different domain, so unless you also call bind on Task 1 for
>> device D, there is no way that D can access Task 1.
>>
>> Thanks,
>> Jean
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function
@ 2017-05-25 12:33                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-05-25 12:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, Jacob Pan, ashok.raj, jasowang, qemu-devel,
	jacob.jun.pan

On 23/05/17 08:50, Liu, Yi L wrote:
> On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
[...]
>>>>
>>>> For the next version of my SVM series, I was thinking of passing group
>>>> instead of device to iommu_bind. Since all devices in a group are expected
>>>> to share the same mappings (whether they want it or not), users will have
>>>
>>> Virtual address space is not tied to protection domain as I/O virtual address
>>> space does. Is it really necessary to affect all the devices in this group.
>>> Or it is just for consistence?
>>
>> It's mostly about consistency, and also avoid hiding implicit behavior in
>> the IOMMU driver. I have the following example, described using group and
>> domain structures from the IOMMU API:
>>                  ____________________
>>                 |IOMMU  ____________ |
>>                 |      |DOM  ______ ||
>>                 |      |    |GRP   |||     bind
>>                 |      |    |    A<-----------------Task 1
>>                 |      |    |    B |||
>>                 |      |    |______|||
>>                 |      |     ______ ||
>>                 |      |    |GRP   |||
>>                 |      |    |    C |||
>>                 |      |    |______|||
>>                 |      |____________||
>>                 |       ____________ |
>>                 |      |DOM  ______ ||
>>                 |      |    |GRP   |||
>>                 |      |    |    D |||
>>                 |      |    |______|||
>>                 |      |____________||
>>                 |____________________|
>>
>> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
>> to some hardware limitation (in the bus, the device or the IOMMU), B can
>> see all DMA transactions issued by A. A and B are therefore in the same
>> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
>> own group.
>>
>> (As far as I know, in the SVM world at the moment, devices are neatly
>> integrated and there is no need for putting multiple devices in the same
>> IOMMU group, but I don't think we should expect all future SVM systems to
>> be well-behaved.)
>>
>> So when a user binds Task 1 to device A, it is *implicitly* giving device
>> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
>> A from B, PASID or not. B could access the same address space as A, even
>> if you don't call bind again to explicitly attach the PASID table to B.
>>
>> If the bind is done with device as argument, maybe users will believe that
>> using PASIDs provides an additional level of isolation within a group,
>> when it really doesn't. That's why I'm inclined to have the whole bind API
>> be on groups rather than devices, if only for clarity.
> 
> This may depend on how the user understand the isolation. I think different
> PASID does mean different address space. From this perspective, it does look
> like isolation.

Yes, and it isn't isolation. Not at device granularity, that is. IOMMU has
the concept of group because sometimes the hardware simply cannot isolate
devices. Different PASIDs does mean different address spaces, but two
devices in the same group may be able to access each other's address
spaces, regardless of the presence of a PASID.

To illustrate the problem with PASIDs, let's say that for whatever reason
(e.g. lack of ACS Source Validation in a PCI switch), device B (0x0100)
can spoof device A's RID (0x0200). Therefore we put A and B in the same
IOMMU group.

User binds Task 1 to device A and Task 2 to device B. They use PASIDs X
and Y, so user thinks that they are isolated. But given the physical
properties of the system, device B can pretend it is device A, and access
the whole address space of Task 1 by sending transactions with RID 0x0200
and PASID X. So user effectively created a backdoor between tasks 1 and 2
without knowing it, and using PASIDs didn't add any protection.

>> But I don't know, maybe a comment explaining the above would be sufficient.
>>
>> To be frank my comment about group versus device is partly to make sure
>> that I grasp the various concepts correctly and that we're on the same
>> page. Doing the bind on groups is less significant in your case, for PASID
>> table binding, because VFIO already takes care of IOMMU group properly. In
>> my case I expect DRM, network, DMA drivers to use the API as well for
>> binding tasks, and I don't want to introduce ambiguity in the API that
>> would lead to security holes later.
> 
> For this part, would you provide more detail about why it would be more
> significant to bind on group level in your case? I think we need strong
> reason to support it. Currently, the other map_page APIs are passing
> device as argument. Would it also be recommended to use group as argument?

Well I'm only concerned about the API we're introducing at the moment, I'm
not suggesting we change existing ones. Because PASID is a new concept and
is currently unregulated, it would be good for new users to understand
what kind of isolation they are getting from it. And it is more important
than previous APIs because SVM's main objective is to simplify userspace
programming model, and therefore bring e.g. GPU programming to users that
will be more naive with regard to hardware properties and limitations.

I'm thinking for instance about GPU drivers using the bind API to provide
OpenCL SVM to userspace. If the person writing the driver has to pass
IOMMU groups instead of devices, they will have less chance to fall into
the trap described above. They would have to follow the VFIO model, and
propagate the concept of IOMMU groups all the way to userspace.

As I said, maybe we can just add a comment warning future users about the
limitations of PASID isolation and that will be enough, I really don't
know what's best. Since VFIO will likely stay the only user of PASID
tables binding and handles IOMMU groups well, I think it boils down to
stylistic decision in your case.

Thanks,
Jean

>>>> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
>>>> might be simpler to let the IOMMU core take the group lock and do
>>>> group->domain->ops->bind_task(dev...) for each device. The question also
>>>> holds for iommu_do_invalidate in patch 3/8.
>>>
>>> In my understanding, it is moving the for_each_dev loop into iommu driver?
>>> Is it?
>>
>> Yes, that's what I meant
>>
>>>> This way the prototypes would be:
>>>> int iommu_bind...(struct iommu_group *group, struct ... *info)
>>>> int iommu_unbind...(struct iommu_group *group, struct ...*info)
>>>> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
>>>
>>> For PASID table binding from guest, I think it'd better to be per-device op
>>> since the bind operation wants to modify the host context entry. But we may
>>> still share the API and do things differently in iommu driver.
>>
>> Sure, as said above the use cases for PASID table and single PASID binding
>> are different, sharing the API is not strictly necessary.
>>
>>> For invalidation, I think it'd better to be per-group. Actually, with guest
>>> IOMMU exists, there is only one group in a domain on Intel platform. Do it for
>>> each device is not expected. How about it on ARM?
>>
>> In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
>> domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
>> multiple groups in the same container to the same domain when possible.
>>
>>>> For PASID table binding it might not matter much, as VFIO will most likely
>>>> be the only user. But task binding will be called by device drivers, which
>>>> by now should be encouraged to do things at iommu_group granularity.
>>>> Alternatively it could be done implicitly like in iommu_attach_device,
>>>> with "iommu_bind_device_x" calling "iommu_bind_group_x".
>>>
>>> Do you mean the bind task from userspace driver? I guess you're trying to do
>>> different types of binding request in a single svm_bind API?
>>>
>>>>
>>>> Extending this reasoning, since groups in a domain are also supposed to
>>>> have the same mappings, then similarly to map/unmap,
>>>> bind/unbind/invalidate should really be done with an iommu_domain (and
>>>> nothing else) as target argument. However this requires the IOMMU core to
>>>> keep a group list in each domain, which might complicate things a little
>>>> too much.
>>>>
>>>> But "all devices in a domain share the same PASID table" is the paradigm
>>>> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
>>>> iommu_group, it should be made more explicit to users, so they don't
>>>> assume that devices within a domain are isolated from each others with
>>>> regard to PASID DMA.
>>>
>>> Is the isolation you mentioned means forbidding to do PASID DMA to the same
>>> virtual address space when the device comes from different domain?
>>
>> In the above example, devices A, B and C are in the same IOMMU domain
>> (because, for instance, user put the two groups in the same VFIO
>> container.) Then in the SMMUv3 driver they would all share the same PASID
>> table. A, B and C can access Task 1 with the PASID obtained during the
>> depicted bind. They don't need to call bind again for device C, though it
>> would be good practice.
>>
>> But D is in a different domain, so unless you also call bind on Task 1 for
>> device D, there is no way that D can access Task 1.
>>
>> Thanks,
>> Jean
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-05-12 12:11     ` [Qemu-devel] " Jean-Philippe Brucker
@ 2017-07-02 10:06         ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-02 10:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:

Hi Jean,

As we've got a few discussions on it. I'd like to have a conclusion and
make it as a reference for future discussion.

Currently, we are inclined to have a hybrid format for the iommu tlb
invalidation from userspace(vIOMMU or userspace driver).

Based on the previous discussion, may the below work?

1. Add a IOCTL for iommu tlb invalidation.

VFIO_IOMMU_TLB_INVALIDATE

struct vfio_iommu_tlb_invalidate {
   __u32   argsz;
   __u32   length;
   __u8    data[];
};

comments from Alex William: is it more suitable to add a new flag bit on
vfio_device_svm(a structure defined in patch 5 of this patchset), the data
structure is so similar.

Personally, I'm ok with it. Pls let me know your thoughts. However, the
precondition is we accept the whole definition in this email. If not, the
vfio_iommu_tlb_invalidate would be defined differently.

2. Define a structure in include/uapi/linux/iommu.h(newly added header file)

struct iommu_tlb_invalidate {
	__u32	scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID	(1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR	(1 << 1)
	__u32	flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
   all intermediate tables cached as part of the PTE for
   vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	__u8	data[];
};

For this part, the scope and flags are basically aligned with your previous
email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
it is much more suitable to be defined in a uapi header file.

Besides the reason above, I don't want VFIO engae too much on the data parsing.
If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
then both kernel space vfio and user space vfio needs to do much parsing. So I
may prefer the way above.

If you've got any other idea, pls feel free to post it. It's welcomed.

Thanks,
Yi L

> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.
> 
> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
> 	__u32	argsz;
> 	__u32	scope;
> 	__u32	flags;
> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	__u8	data[];
> };
> 
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 
> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 
> Thanks,
> Jean
> 
> 
> > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> > 
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-02 10:06         ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-02 10:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, ashok.raj, jasowang, qemu-devel, jacob.jun.pan

On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:

Hi Jean,

As we've got a few discussions on it. I'd like to have a conclusion and
make it as a reference for future discussion.

Currently, we are inclined to have a hybrid format for the iommu tlb
invalidation from userspace(vIOMMU or userspace driver).

Based on the previous discussion, may the below work?

1. Add a IOCTL for iommu tlb invalidation.

VFIO_IOMMU_TLB_INVALIDATE

struct vfio_iommu_tlb_invalidate {
   __u32   argsz;
   __u32   length;
   __u8    data[];
};

comments from Alex William: is it more suitable to add a new flag bit on
vfio_device_svm(a structure defined in patch 5 of this patchset), the data
structure is so similar.

Personally, I'm ok with it. Pls let me know your thoughts. However, the
precondition is we accept the whole definition in this email. If not, the
vfio_iommu_tlb_invalidate would be defined differently.

2. Define a structure in include/uapi/linux/iommu.h(newly added header file)

struct iommu_tlb_invalidate {
	__u32	scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID	(1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR	(1 << 1)
	__u32	flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
   all intermediate tables cached as part of the PTE for
   vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	__u8	data[];
};

For this part, the scope and flags are basically aligned with your previous
email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
it is much more suitable to be defined in a uapi header file.

Besides the reason above, I don't want VFIO engae too much on the data parsing.
If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
then both kernel space vfio and user space vfio needs to do much parsing. So I
may prefer the way above.

If you've got any other idea, pls feel free to post it. It's welcomed.

Thanks,
Yi L

> Hi Yi,
> 
> On 26/04/17 11:12, Liu, Yi L wrote:
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> > invalidate request from guest to host.
> > 
> > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > no knowledge of caching structure updates unless the guest
> > invalidation activities are passed down to the host. So a new
> > IOCTL is needed to propagate the guest cache invalidation through
> > VFIO.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/uapi/linux/vfio.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 6b97987..50c51f8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >  
> >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >  
> > +/* For IOMMU TLB Invalidation Propagation */
> > +struct vfio_iommu_tlb_invalidate {
> > +	__u32	argsz;
> > +	__u32	length;
> > +	__u8	data[];
> > +};
> 
> We initially discussed something a little more generic than this, with
> most info explicitly described and only pIOMMU-specific quirks and hints
> in an opaque structure. Out of curiosity, why the change? I'm not against
> a fully opaque structure, but there seem to be a large overlap between TLB
> invalidations across architectures.
> 
> 
> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> with the following.
> 
> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> after unpacking an opaque structure, since userspace doesn't know what's
> in it and guest might use a different endianess. So we need to force all
> opaque data to be e.g. little-endian.)
> 
> struct vfio_iommu_tlb_invalidate {
> 	__u32	argsz;
> 	__u32	scope;
> 	__u32	flags;
> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	__u8	data[];
> };
> 
> Scope is a bitfields restricting the invalidation scope. By default
> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> and @size are unused.
> 
> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> scope to the pasid described by @pasid.
> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> scope to the address range described by (@vaddr, @size).
> 
> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> range for *all* pasids (as well as no_pasid). Setting scope =
> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> the VA range only for @pasid.
> 
> Flags depend on the selected scope:
> 
> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> exclusively (some architectures, e.g. SMMU, allow this)
> 
> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> only the last-level entry (pte). This is a hint.
> 
> I guess what's missing for Intel IOMMU and would go in @data is the
> "global" hint (which we don't have in SMMU invalidations). Do you see
> anything else, that the pIOMMU cannot deduce from this structure?
> 
> Thanks,
> Jean
> 
> 
> > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >  
> >  /*
> > 
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-03 11:52         ` Jean-Philippe Brucker
@ 2017-07-03 10:31               ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-03 10:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

Hi Jean,

On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 02/07/17 11:06, Liu, Yi L wrote:
> > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> > 
> > Hi Jean,
> > 
> > As we've got a few discussions on it. I'd like to have a conclusion and
> > make it as a reference for future discussion.
> > 
> > Currently, we are inclined to have a hybrid format for the iommu tlb
> > invalidation from userspace(vIOMMU or userspace driver).
> > 
> > Based on the previous discussion, may the below work?
> > 
> > 1. Add a IOCTL for iommu tlb invalidation.
> > 
> > VFIO_IOMMU_TLB_INVALIDATE
> > 
> > struct vfio_iommu_tlb_invalidate {
> >    __u32   argsz;
> >    __u32   length;
> 
> Wouldn't argsz be exactly length + 8? Might be redundant in this case.

yes, it is. we may not use it in future version. but yes, if we still use it.
I think we can make it easier.
 
> >    __u8    data[];
> > };
> > 
> > comments from Alex William: is it more suitable to add a new flag bit on
> > vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> > structure is so similar.
> > 
> > Personally, I'm ok with it. Pls let me know your thoughts. However, the
> > precondition is we accept the whole definition in this email. If not, the
> > vfio_iommu_tlb_invalidate would be defined differently.
> 
> With this proposal sharing the structure makes sense. As I understand it
> we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
> flag bit would be redundant.

yes, it seems to be strange if we share vfio_device_svm structure but use
a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just
add a new flag. Then all the svm related operations share the IOCTL. However,
need to check if there would be any non-svm related iommu tlb invalidation.
Then vfio_device_svm should be renamed to be non-svm specific.

> 
> > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> > 
> > struct iommu_tlb_invalidate {
> > 	__u32	scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > 	__u32	flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> 
> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> using a special mode where PASID 0 is reserved and any traffic without
> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> to invalidate that special context explicitly. But this means that
> invalidation packet targeted at that context will have "scope = PASID" and
> "flags = NO_PASID", which is utterly confusing.
> 
> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
> and just use PASID 0 to invalidate this context on ARM. I don't think
> other architectures would use the NO_PASID flag anyway, but might be mistaken.

I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so
we may work without it. But if other vendor want to issue non-PASID tagged
cache, then may encounter problem.

> > /* indicating that the pIOMMU doesn't need to invalidate
> >    all intermediate tables cached as part of the PTE for
> >    vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> > 	__u32	pasid;
> > 	__u64	vaddr;
> > 	__u64	size;
> > 	__u8	data[];
> > };
> > 
> > For this part, the scope and flags are basically aligned with your previous
> > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
> > would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> > it is much more suitable to be defined in a uapi header file.
> 
> I tend to agree, defining a single structure in a new IOMMU UAPI file is
> better than having identical structures both in uapi/linux/vfio.h and
> linux/iommu.h. This way we avoid VFIO having to copy the same structure
> field by field. Arch-specific structures that go in
> iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

yes, it is.

> > Besides the reason above, I don't want VFIO engae too much on the data parsing.
> > If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
> > then both kernel space vfio and user space vfio needs to do much parsing. So I
> > may prefer the way above.
> 
> Would the entire validation of struct iommu_tlb_invalidate be offloaded to
> the IOMMU subsystem? Checking the structure sizes, copying from user, and
> validating the flags?

no, the copying from user and flag validation is still in VFIO. Basic idea is
still passing the iommu_tlb_invalidate.data to iommu sub-system.

> I guess it's mostly an implementation detail, but it might be better to
> keep this code in VFIO as well, even for the validation of
> iommu_tlb_invalidate.data (which would require VFIO to keep track of the
> model used during bind). This way VFIO sanitizes what comes from
> userspace, whilst the IOMMU subsystem only deals with valid kernel
> structures, and another subsystem could easily reuse the
> iommu_tlb_invalidate API.

it's a good idae. may think about it. VFIO should also be able to parse the
generic part of the iommu_tlb_invalidate.data.


Thanks,
Yi L

> The IOMMU subsystem would still validate the meaning of the fields, for
> instance whether a given combination of flag is allowed or if the PASID
> exists.
> 
> Thanks,
> Jean
> 
> > If you've got any other idea, pls feel free to post it. It's welcomed.
> > 
> > Thanks,
> > Yi L
> > 
> >> Hi Yi,
> >>
> >> On 26/04/17 11:12, Liu, Yi L wrote:
> >>> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>>
> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> >>> invalidate request from guest to host.
> >>>
> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has
> >>> no knowledge of caching structure updates unless the guest
> >>> invalidation activities are passed down to the host. So a new
> >>> IOCTL is needed to propagate the guest cache invalidation through
> >>> VFIO.
> >>>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>> ---
> >>>  include/uapi/linux/vfio.h | 9 +++++++++
> >>>  1 file changed, 9 insertions(+)
> >>>
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 6b97987..50c51f8 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >>>  
> >>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >>>  
> >>> +/* For IOMMU TLB Invalidation Propagation */
> >>> +struct vfio_iommu_tlb_invalidate {
> >>> +	__u32	argsz;
> >>> +	__u32	length;
> >>> +	__u8	data[];
> >>> +};
> >>
> >> We initially discussed something a little more generic than this, with
> >> most info explicitly described and only pIOMMU-specific quirks and hints
> >> in an opaque structure. Out of curiosity, why the change? I'm not against
> >> a fully opaque structure, but there seem to be a large overlap between TLB
> >> invalidations across architectures.
> >>
> >>
> >> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> >> with the following.
> >>
> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> >> after unpacking an opaque structure, since userspace doesn't know what's
> >> in it and guest might use a different endianess. So we need to force all
> >> opaque data to be e.g. little-endian.)
> >>
> >> struct vfio_iommu_tlb_invalidate {
> >> 	__u32	argsz;
> >> 	__u32	scope;
> >> 	__u32	flags;
> >> 	__u32	pasid;
> >> 	__u64	vaddr;
> >> 	__u64	size;
> >> 	__u8	data[];
> >> };
> >>
> >> Scope is a bitfields restricting the invalidation scope. By default
> >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> >> and @size are unused.
> >>
> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> >> scope to the pasid described by @pasid.
> >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> >> scope to the address range described by (@vaddr, @size).
> >>
> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> >> range for *all* pasids (as well as no_pasid). Setting scope =
> >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> >> the VA range only for @pasid.
> >>
> >> Flags depend on the selected scope:
> >>
> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> >> exclusively (some architectures, e.g. SMMU, allow this)>>
> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> >> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> >> only the last-level entry (pte). This is a hint.
> >>
> >> I guess what's missing for Intel IOMMU and would go in @data is the
> >> "global" hint (which we don't have in SMMU invalidations). Do you see
> >> anything else, that the pIOMMU cannot deduce from this structure?
> >>
> >> Thanks,
> >> Jean
> >>
> >>
> >>> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> >>> +
> >>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >>>  
> >>>  /*
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-03 10:31               ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-03 10:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: tianyu.lan, kevin.tian, Liu, Yi L, ashok.raj, kvm, jasowang,
	Will Deacon, alex.williamson, peterx, qemu-devel, iommu,
	jacob.jun.pan

Hi Jean,

On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 02/07/17 11:06, Liu, Yi L wrote:
> > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> > 
> > Hi Jean,
> > 
> > As we've got a few discussions on it. I'd like to have a conclusion and
> > make it as a reference for future discussion.
> > 
> > Currently, we are inclined to have a hybrid format for the iommu tlb
> > invalidation from userspace(vIOMMU or userspace driver).
> > 
> > Based on the previous discussion, may the below work?
> > 
> > 1. Add a IOCTL for iommu tlb invalidation.
> > 
> > VFIO_IOMMU_TLB_INVALIDATE
> > 
> > struct vfio_iommu_tlb_invalidate {
> >    __u32   argsz;
> >    __u32   length;
> 
> Wouldn't argsz be exactly length + 8? Might be redundant in this case.

yes, it is. we may not use it in future version. but yes, if we still use it.
I think we can make it easier.
 
> >    __u8    data[];
> > };
> > 
> > comments from Alex William: is it more suitable to add a new flag bit on
> > vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> > structure is so similar.
> > 
> > Personally, I'm ok with it. Pls let me know your thoughts. However, the
> > precondition is we accept the whole definition in this email. If not, the
> > vfio_iommu_tlb_invalidate would be defined differently.
> 
> With this proposal sharing the structure makes sense. As I understand it
> we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
> flag bit would be redundant.

yes, it seems to be strange if we share vfio_device_svm structure but use
a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just
add a new flag. Then all the svm related operations share the IOCTL. However,
need to check if there would be any non-svm related iommu tlb invalidation.
Then vfio_device_svm should be renamed to be non-svm specific.

> 
> > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> > 
> > struct iommu_tlb_invalidate {
> > 	__u32	scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > 	__u32	flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> 
> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> using a special mode where PASID 0 is reserved and any traffic without
> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> to invalidate that special context explicitly. But this means that
> invalidation packet targeted at that context will have "scope = PASID" and
> "flags = NO_PASID", which is utterly confusing.
> 
> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
> and just use PASID 0 to invalidate this context on ARM. I don't think
> other architectures would use the NO_PASID flag anyway, but might be mistaken.

I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so
we may work without it. But if other vendor want to issue non-PASID tagged
cache, then may encounter problem.

> > /* indicating that the pIOMMU doesn't need to invalidate
> >    all intermediate tables cached as part of the PTE for
> >    vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> > 	__u32	pasid;
> > 	__u64	vaddr;
> > 	__u64	size;
> > 	__u8	data[];
> > };
> > 
> > For this part, the scope and flags are basically aligned with your previous
> > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
> > would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> > it is much more suitable to be defined in a uapi header file.
> 
> I tend to agree, defining a single structure in a new IOMMU UAPI file is
> better than having identical structures both in uapi/linux/vfio.h and
> linux/iommu.h. This way we avoid VFIO having to copy the same structure
> field by field. Arch-specific structures that go in
> iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

yes, it is.

> > Besides the reason above, I don't want VFIO engae too much on the data parsing.
> > If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
> > then both kernel space vfio and user space vfio needs to do much parsing. So I
> > may prefer the way above.
> 
> Would the entire validation of struct iommu_tlb_invalidate be offloaded to
> the IOMMU subsystem? Checking the structure sizes, copying from user, and
> validating the flags?

no, the copying from user and flag validation is still in VFIO. Basic idea is
still passing the iommu_tlb_invalidate.data to iommu sub-system.

> I guess it's mostly an implementation detail, but it might be better to
> keep this code in VFIO as well, even for the validation of
> iommu_tlb_invalidate.data (which would require VFIO to keep track of the
> model used during bind). This way VFIO sanitizes what comes from
> userspace, whilst the IOMMU subsystem only deals with valid kernel
> structures, and another subsystem could easily reuse the
> iommu_tlb_invalidate API.

it's a good idae. may think about it. VFIO should also be able to parse the
generic part of the iommu_tlb_invalidate.data.


Thanks,
Yi L

> The IOMMU subsystem would still validate the meaning of the fields, for
> instance whether a given combination of flag is allowed or if the PASID
> exists.
> 
> Thanks,
> Jean
> 
> > If you've got any other idea, pls feel free to post it. It's welcomed.
> > 
> > Thanks,
> > Yi L
> > 
> >> Hi Yi,
> >>
> >> On 26/04/17 11:12, Liu, Yi L wrote:
> >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> >>>
> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> >>> invalidate request from guest to host.
> >>>
> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has
> >>> no knowledge of caching structure updates unless the guest
> >>> invalidation activities are passed down to the host. So a new
> >>> IOCTL is needed to propagate the guest cache invalidation through
> >>> VFIO.
> >>>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >>> ---
> >>>  include/uapi/linux/vfio.h | 9 +++++++++
> >>>  1 file changed, 9 insertions(+)
> >>>
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 6b97987..50c51f8 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >>>  
> >>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >>>  
> >>> +/* For IOMMU TLB Invalidation Propagation */
> >>> +struct vfio_iommu_tlb_invalidate {
> >>> +	__u32	argsz;
> >>> +	__u32	length;
> >>> +	__u8	data[];
> >>> +};
> >>
> >> We initially discussed something a little more generic than this, with
> >> most info explicitly described and only pIOMMU-specific quirks and hints
> >> in an opaque structure. Out of curiosity, why the change? I'm not against
> >> a fully opaque structure, but there seem to be a large overlap between TLB
> >> invalidations across architectures.
> >>
> >>
> >> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> >> with the following.
> >>
> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> >> after unpacking an opaque structure, since userspace doesn't know what's
> >> in it and guest might use a different endianess. So we need to force all
> >> opaque data to be e.g. little-endian.)
> >>
> >> struct vfio_iommu_tlb_invalidate {
> >> 	__u32	argsz;
> >> 	__u32	scope;
> >> 	__u32	flags;
> >> 	__u32	pasid;
> >> 	__u64	vaddr;
> >> 	__u64	size;
> >> 	__u8	data[];
> >> };
> >>
> >> Scope is a bitfields restricting the invalidation scope. By default
> >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> >> and @size are unused.
> >>
> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> >> scope to the pasid described by @pasid.
> >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> >> scope to the address range described by (@vaddr, @size).
> >>
> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> >> range for *all* pasids (as well as no_pasid). Setting scope =
> >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> >> the VA range only for @pasid.
> >>
> >> Flags depend on the selected scope:
> >>
> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> >> exclusively (some architectures, e.g. SMMU, allow this)>>
> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> >> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> >> only the last-level entry (pte). This is a hint.
> >>
> >> I guess what's missing for Intel IOMMU and would go in @data is the
> >> "global" hint (which we don't have in SMMU invalidations). Do you see
> >> anything else, that the pIOMMU cannot deduce from this structure?
> >>
> >> Thanks,
> >> Jean
> >>
> >>
> >>> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> >>> +
> >>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >>>  
> >>>  /*
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-02 10:06         ` Liu, Yi L
  (?)
@ 2017-07-03 11:52         ` Jean-Philippe Brucker
       [not found]           ` <0e4f2dd4-d553-b1b7-7bec-fe0ff5242c54-5wv7dgnIgG8@public.gmane.org>
  -1 siblings, 1 reply; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-03 11:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, kvm, iommu, alex.williamson, peterx, tianyu.lan,
	kevin.tian, ashok.raj, jasowang, qemu-devel, jacob.jun.pan,
	Will Deacon

Hi Yi,

On 02/07/17 11:06, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> 
> Hi Jean,
> 
> As we've got a few discussions on it. I'd like to have a conclusion and
> make it as a reference for future discussion.
> 
> Currently, we are inclined to have a hybrid format for the iommu tlb
> invalidation from userspace(vIOMMU or userspace driver).
> 
> Based on the previous discussion, may the below work?
> 
> 1. Add a IOCTL for iommu tlb invalidation.
> 
> VFIO_IOMMU_TLB_INVALIDATE
> 
> struct vfio_iommu_tlb_invalidate {
>    __u32   argsz;
>    __u32   length;

Wouldn't argsz be exactly length + 8? Might be redundant in this case.

>    __u8    data[];
> };
> 
> comments from Alex William: is it more suitable to add a new flag bit on
> vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> structure is so similar.
> 
> Personally, I'm ok with it. Pls let me know your thoughts. However, the
> precondition is we accept the whole definition in this email. If not, the
> vfio_iommu_tlb_invalidate would be defined differently.

With this proposal sharing the structure makes sense. As I understand it
we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
flag bit would be redundant.

> 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> 
> struct iommu_tlb_invalidate {
> 	__u32	scope;
> /* pasid-selective invalidation described by @pasid */
> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> /* address-selevtive invalidation described by (@vaddr, @size) */
> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> 	__u32	flags;
> /*  targets non-pasid mappings, @pasid is not valid */
> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)

Although it was my proposal, I don't like this flag. In ARM SMMU, we're
using a special mode where PASID 0 is reserved and any traffic without
PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
to invalidate that special context explicitly. But this means that
invalidation packet targeted at that context will have "scope = PASID" and
"flags = NO_PASID", which is utterly confusing.

I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
and just use PASID 0 to invalidate this context on ARM. I don't think
other architectures would use the NO_PASID flag anyway, but might be mistaken.

> /* indicating that the pIOMMU doesn't need to invalidate
>    all intermediate tables cached as part of the PTE for
>    vaddr, only the last-level entry (pte). This is a hint. */
> #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	__u8	data[];
> };
> 
> For this part, the scope and flags are basically aligned with your previous
> email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
> would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> it is much more suitable to be defined in a uapi header file.

I tend to agree, defining a single structure in a new IOMMU UAPI file is
better than having identical structures both in uapi/linux/vfio.h and
linux/iommu.h. This way we avoid VFIO having to copy the same structure
field by field. Arch-specific structures that go in
iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

> Besides the reason above, I don't want VFIO engae too much on the data parsing.
> If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
> then both kernel space vfio and user space vfio needs to do much parsing. So I
> may prefer the way above.

Would the entire validation of struct iommu_tlb_invalidate be offloaded to
the IOMMU subsystem? Checking the structure sizes, copying from user, and
validating the flags?

I guess it's mostly an implementation detail, but it might be better to
keep this code in VFIO as well, even for the validation of
iommu_tlb_invalidate.data (which would require VFIO to keep track of the
model used during bind). This way VFIO sanitizes what comes from
userspace, whilst the IOMMU subsystem only deals with valid kernel
structures, and another subsystem could easily reuse the
iommu_tlb_invalidate API.

The IOMMU subsystem would still validate the meaning of the fields, for
instance whether a given combination of flag is allowed or if the PASID
exists.

Thanks,
Jean

> If you've got any other idea, pls feel free to post it. It's welcomed.
> 
> Thanks,
> Yi L
> 
>> Hi Yi,
>>
>> On 26/04/17 11:12, Liu, Yi L wrote:
>>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
>>>
>>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
>>> invalidate request from guest to host.
>>>
>>> In the case of SVM virtualization on VT-d, host IOMMU driver has
>>> no knowledge of caching structure updates unless the guest
>>> invalidation activities are passed down to the host. So a new
>>> IOCTL is needed to propagate the guest cache invalidation through
>>> VFIO.
>>>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  include/uapi/linux/vfio.h | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 6b97987..50c51f8 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>>>  
>>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
>>>  
>>> +/* For IOMMU TLB Invalidation Propagation */
>>> +struct vfio_iommu_tlb_invalidate {
>>> +	__u32	argsz;
>>> +	__u32	length;
>>> +	__u8	data[];
>>> +};
>>
>> We initially discussed something a little more generic than this, with
>> most info explicitly described and only pIOMMU-specific quirks and hints
>> in an opaque structure. Out of curiosity, why the change? I'm not against
>> a fully opaque structure, but there seem to be a large overlap between TLB
>> invalidations across architectures.
>>
>>
>> For what it's worth, when prototyping the paravirtualized IOMMU I came up
>> with the following.
>>
>> (From the paravirtualized POV, the SMMU also has to swizzle endianess
>> after unpacking an opaque structure, since userspace doesn't know what's
>> in it and guest might use a different endianess. So we need to force all
>> opaque data to be e.g. little-endian.)
>>
>> struct vfio_iommu_tlb_invalidate {
>> 	__u32	argsz;
>> 	__u32	scope;
>> 	__u32	flags;
>> 	__u32	pasid;
>> 	__u64	vaddr;
>> 	__u64	size;
>> 	__u8	data[];
>> };
>>
>> Scope is a bitfields restricting the invalidation scope. By default
>> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
>> and @size are unused.
>>
>> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>> scope to the pasid described by @pasid.
>> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
>> scope to the address range described by (@vaddr, @size).
>>
>> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
>> range for *all* pasids (as well as no_pasid). Setting scope =
>> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
>> the VA range only for @pasid.
>>
>> Flags depend on the selected scope:
>>
>> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
>> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
>> exclusively (some architectures, e.g. SMMU, allow this)>>
>> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
>> to invalidate all intermediate tables cached as part of the PTW for vaddr,
>> only the last-level entry (pte). This is a hint.
>>
>> I guess what's missing for Intel IOMMU and would go in @data is the
>> "global" hint (which we don't have in SMMU invalidations). Do you see
>> anything else, that the pIOMMU cannot deduce from this structure?
>>
>> Thanks,
>> Jean
>>
>>
>>> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
>>> +
>>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>>>  
>>>  /*
>>>
>>
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-05-14 10:55       ` [Qemu-devel] " Liu, Yi L
@ 2017-07-05  5:32           ` Tian, Kevin
  -1 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05  5:32 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: Lan, Tianyu, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

> From: Liu, Yi L [mailto:yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org]
> Sent: Sunday, May 14, 2017 6:55 PM
> 
> On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> > On Wed, 26 Apr 2017 18:12:04 +0800
> > "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> >
> > > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > >
> > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU
> TLB
> > > invalidate request from guest to host.
> > >
> > > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > > no knowledge of caching structure updates unless the guest
> > > invalidation activities are passed down to the host. So a new
> > > IOCTL is needed to propagate the guest cache invalidation through
> > > VFIO.
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > > ---
> > >  include/uapi/linux/vfio.h | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 6b97987..50c51f8 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> > >
> > >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE +
> 22)
> > >
> > > +/* For IOMMU TLB Invalidation Propagation */
> > > +struct vfio_iommu_tlb_invalidate {
> > > +	__u32	argsz;
> > > +	__u32	length;
> > > +	__u8	data[];
> > > +};
> > > +
> > > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE +
> 23)
> >
> > I'm kind of wondering why this isn't just a new flag bit on
> > vfio_device_svm, the data structure is so similar.  Of course data
> > needs to be fully specified in uapi.
> 
> Hi Alex,
> 
> For this part, it depends on using opaque structure or not. The following
> link mentioned it in [Open] session.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> If we pick the full opaque solution for iommu tlb invalidate propagation.
> Then I may add a flag bit on vfio_device_svm and also add definition in
> uapi as you suggested.
> 

there is another benefit to keep it as a separate command. For now
we only need to invalidate 1st level translation (GVA->GPA) for SVM,
since 1st level page table is provided by guest while directly walked
by IOMMU. It's possible some vendor may also choose to implement
a nested 2nd level translation (e.g. GIOVA->GPA->HPA) then hardware
can directly walk guest GIOVA page table thus explicit invalidation is
also required. We'd better not to limit invalidation interface with 
svm structure.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05  5:32           ` Tian, Kevin
  0 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05  5:32 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: Liu, Yi L, kvm, iommu, peterx, jasowang, qemu-devel, Raj, Ashok,
	Pan, Jacob jun, Lan, Tianyu, jean-philippe.brucker

> From: Liu, Yi L [mailto:yi.l.liu@linux.intel.com]
> Sent: Sunday, May 14, 2017 6:55 PM
> 
> On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> > On Wed, 26 Apr 2017 18:12:04 +0800
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > >
> > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU
> TLB
> > > invalidate request from guest to host.
> > >
> > > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > > no knowledge of caching structure updates unless the guest
> > > invalidation activities are passed down to the host. So a new
> > > IOCTL is needed to propagate the guest cache invalidation through
> > > VFIO.
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  include/uapi/linux/vfio.h | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 6b97987..50c51f8 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> > >
> > >  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE +
> 22)
> > >
> > > +/* For IOMMU TLB Invalidation Propagation */
> > > +struct vfio_iommu_tlb_invalidate {
> > > +	__u32	argsz;
> > > +	__u32	length;
> > > +	__u8	data[];
> > > +};
> > > +
> > > +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE +
> 23)
> >
> > I'm kind of wondering why this isn't just a new flag bit on
> > vfio_device_svm, the data structure is so similar.  Of course data
> > needs to be fully specified in uapi.
> 
> Hi Alex,
> 
> For this part, it depends on using opaque structure or not. The following
> link mentioned it in [Open] session.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> If we pick the full opaque solution for iommu tlb invalidate propagation.
> Then I may add a flag bit on vfio_device_svm and also add definition in
> uapi as you suggested.
> 

there is another benefit to keep it as a separate command. For now
we only need to invalidate 1st level translation (GVA->GPA) for SVM,
since 1st level page table is provided by guest while directly walked
by IOMMU. It's possible some vendor may also choose to implement
a nested 2nd level translation (e.g. GIOVA->GPA->HPA) then hardware
can directly walk guest GIOVA page table thus explicit invalidation is
also required. We'd better not to limit invalidation interface with 
svm structure.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-03 10:31               ` Liu, Yi L
@ 2017-07-05  6:45                   ` Tian, Kevin
  -1 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05  6:45 UTC (permalink / raw)
  To: Liu, Yi L, Jean-Philippe Brucker
  Cc: Lan, Tianyu, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

> From: Liu, Yi L
> Sent: Monday, July 3, 2017 6:31 PM
> 
> Hi Jean,
> 
> 
> >
> > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header
> file)
> > >
> > > struct iommu_tlb_invalidate {
> > > 	__u32	scope;
> > > /* pasid-selective invalidation described by @pasid */
> > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > #define IOMMU_INVALIDATE_VADDR	(1 << 1)

For VT-d above two flags are related. There is no method of flushing
(@vaddr, @size) for all pasids, which doesn't make sense. address-
selective invalidation is valid only for a given pasid. So it's not appropriate
to put them in same level of scope definition at least for VT-d.

> > > 	__u32	flags;
> > > /*  targets non-pasid mappings, @pasid is not valid */
> > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> >
> > Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > using a special mode where PASID 0 is reserved and any traffic without
> > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> > to invalidate that special context explicitly. But this means that
> > invalidation packet targeted at that context will have "scope = PASID" and
> > "flags = NO_PASID", which is utterly confusing.
> >
> > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID
> flag
> > and just use PASID 0 to invalidate this context on ARM. I don't think
> > other architectures would use the NO_PASID flag anyway, but might be
> mistaken.
> 
> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
> so
> we may work without it. But if other vendor want to issue non-PASID tagged
> cache, then may encounter problem.

I'm worried about what's the criteria which attribute should be abstracted
in common structure and which can be left to opaque. It doesn't make
much sense to do such abstraction purely because different vendor formats
have some common fields. Usually we do such abstraction because 
vendor-agnostic code need to do some common handling before going to
vendor specific code. However in this case VFIO is not expected to do anything
with those IOMMU specific attributes. Then the structure is directly forwarded
to IOMMU driver, which simply translates the structure into vendor specific
opaque data again. Then why bothering to do double translations in Qemu
and IOMMU driver side?

Take VT-d for example. Below is a summary of all possible selections around
invalidation of 1st level structure for svm:

Scope: All PASIDs, single PASID
for each PASID:
	all mappings, or page-selective mappings (addr, size)
invalidation target:
	IOTLB entries (leaf)
	paging structure cache (non-leaf)
	PASID cache (pasid->cr3)
invalidation hint:
	whether global pages are included
	drain reads/writes

Above are pretty architectural attributes if just looking at functional
purpose. Then if we really consider defining a common structure, it
might be more natural to define a superset of all vendors' capabilities
and remove the opaque field at all. But as said earlier the purpose of
doing such abstraction is not clear if there is no vendor-agnostic
user actually digesting those fields. Then should we reconsider the
full opaque approach?

Welcome comments since I may overlook something here. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05  6:45                   ` Tian, Kevin
  0 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05  6:45 UTC (permalink / raw)
  To: Liu, Yi L, Jean-Philippe Brucker
  Cc: Lan, Tianyu, Liu, Yi L, Raj, Ashok, kvm, jasowang, Will Deacon,
	alex.williamson, peterx, qemu-devel, iommu, Pan, Jacob jun

> From: Liu, Yi L
> Sent: Monday, July 3, 2017 6:31 PM
> 
> Hi Jean,
> 
> 
> >
> > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header
> file)
> > >
> > > struct iommu_tlb_invalidate {
> > > 	__u32	scope;
> > > /* pasid-selective invalidation described by @pasid */
> > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > #define IOMMU_INVALIDATE_VADDR	(1 << 1)

For VT-d above two flags are related. There is no method of flushing
(@vaddr, @size) for all pasids, which doesn't make sense. address-
selective invalidation is valid only for a given pasid. So it's not appropriate
to put them in same level of scope definition at least for VT-d.

> > > 	__u32	flags;
> > > /*  targets non-pasid mappings, @pasid is not valid */
> > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> >
> > Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > using a special mode where PASID 0 is reserved and any traffic without
> > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> > to invalidate that special context explicitly. But this means that
> > invalidation packet targeted at that context will have "scope = PASID" and
> > "flags = NO_PASID", which is utterly confusing.
> >
> > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID
> flag
> > and just use PASID 0 to invalidate this context on ARM. I don't think
> > other architectures would use the NO_PASID flag anyway, but might be
> mistaken.
> 
> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
> so
> we may work without it. But if other vendor want to issue non-PASID tagged
> cache, then may encounter problem.

I'm worried about what's the criteria which attribute should be abstracted
in common structure and which can be left to opaque. It doesn't make
much sense to do such abstraction purely because different vendor formats
have some common fields. Usually we do such abstraction because 
vendor-agnostic code need to do some common handling before going to
vendor specific code. However in this case VFIO is not expected to do anything
with those IOMMU specific attributes. Then the structure is directly forwarded
to IOMMU driver, which simply translates the structure into vendor specific
opaque data again. Then why bothering to do double translations in Qemu
and IOMMU driver side?

Take VT-d for example. Below is a summary of all possible selections around
invalidation of 1st level structure for svm:

Scope: All PASIDs, single PASID
for each PASID:
	all mappings, or page-selective mappings (addr, size)
invalidation target:
	IOTLB entries (leaf)
	paging structure cache (non-leaf)
	PASID cache (pasid->cr3)
invalidation hint:
	whether global pages are included
	drain reads/writes

Above are pretty architectural attributes if just looking at functional
purpose. Then if we really consider defining a common structure, it
might be more natural to define a superset of all vendors' capabilities
and remove the opaque field at all. But as said earlier the purpose of
doing such abstraction is not clear if there is no vendor-agnostic
user actually digesting those fields. Then should we reconsider the
full opaque approach?

Welcome comments since I may overlook something here. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-05  6:45                   ` Tian, Kevin
@ 2017-07-05 12:42                       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:42 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L
  Cc: Lan, Tianyu, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On 05/07/17 07:45, Tian, Kevin wrote:
>> From: Liu, Yi L
>> Sent: Monday, July 3, 2017 6:31 PM
>>
>> Hi Jean,
>>
>>
>>>
>>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header
>> file)
>>>>
>>>> struct iommu_tlb_invalidate {
>>>> 	__u32	scope;
>>>> /* pasid-selective invalidation described by @pasid */
>>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
>>>> /* address-selevtive invalidation described by (@vaddr, @size) */
>>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> 
> For VT-d above two flags are related. There is no method of flushing
> (@vaddr, @size) for all pasids, which doesn't make sense. address-
> selective invalidation is valid only for a given pasid. So it's not appropriate
> to put them in same level of scope definition at least for VT-d.

For ARM SMMU the "flush all by VA" operation is valid. Although it's
unclear at this point if we will ever allow that, it should probably stay
in the common format, if there is one.

>>>> 	__u32	flags;
>>>> /*  targets non-pasid mappings, @pasid is not valid */
>>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
>>>
>>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
>>> using a special mode where PASID 0 is reserved and any traffic without
>>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
>>> to invalidate that special context explicitly. But this means that
>>> invalidation packet targeted at that context will have "scope = PASID" and
>>> "flags = NO_PASID", which is utterly confusing.
>>>
>>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID
>> flag
>>> and just use PASID 0 to invalidate this context on ARM. I don't think
>>> other architectures would use the NO_PASID flag anyway, but might be
>> mistaken.
>>
>> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
>> so
>> we may work without it. But if other vendor want to issue non-PASID tagged
>> cache, then may encounter problem.
> 
> I'm worried about what's the criteria which attribute should be abstracted
> in common structure and which can be left to opaque. It doesn't make
> much sense to do such abstraction purely because different vendor formats
> have some common fields. Usually we do such abstraction because 
> vendor-agnostic code need to do some common handling before going to
> vendor specific code. However in this case VFIO is not expected to do anything
> with those IOMMU specific attributes. Then the structure is directly forwarded
> to IOMMU driver, which simply translates the structure into vendor specific
> opaque data again. Then why bothering to do double translations in Qemu
> and IOMMU driver side?>
> Take VT-d for example. Below is a summary of all possible selections around
> invalidation of 1st level structure for svm:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
> 	all mappings, or page-selective mappings (addr, size)
> invalidation target:
> 	IOTLB entries (leaf)
> 	paging structure cache (non-leaf)

I'm curious, can you invalidate all intermediate paging structures for a
given PASID without invalidating the leaves?

> 	PASID cache (pasid->cr3)
I guess any implementations that gives the whole PASID table to userspace
will need the PASID cache invalidation. This was missing from my proposal
since it was from virtio-iommu.

> invalidation hint:
> 	whether global pages are included
> 	drain reads/writes>
> Above are pretty architectural attributes if just looking at functional
> purpose. Then if we really consider defining a common structure, it
> might be more natural to define a superset of all vendors' capabilities
> and remove the opaque field at all. But as said earlier the purpose of
> doing such abstraction is not clear if there is no vendor-agnostic
> user actually digesting those fields. Then should we reconsider the
> full opaque approach?
> 
> Welcome comments since I may overlook something here. :-)

I guess on x86 the invalidation packet formats are stable, but for ARM I'm
reluctant to deal with vendor-specific formats at the API level, because
they tend to be volatile. If a virtual IOMMU version is different from the
physical one, then the page table format will be the same but invalidation
format will not.

So it would be good to define common fields that have the same effects
regardless on the underlying pIOMMU. And the fields that differ between
ARM and x86 seem to only be hints.

In addition on ARM SMMU, the guest cannot build an invalidation command
that the host could simply copy into the hardware command queue. The
pIOMMU driver needs to craft an invalidation command with a Virtual
Machine ID, that the guest is never aware of, and a separate ATS
invalidation command. It might also need to retrieve an Address Space ID
associated with the given PASID if it chose to hide it from the guest.

So for us the invalidation structure would always be different from the
hardware one. That's why I do not have any reason the prefer an opaque
structure in the first place, and defining generic fields looks much
neater :) Then again, I don't have any strong technical objection to it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05 12:42                       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:42 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L
  Cc: Lan, Tianyu, Liu, Yi L, Raj, Ashok, kvm, jasowang, Will Deacon,
	alex.williamson, peterx, qemu-devel, iommu, Pan, Jacob jun

On 05/07/17 07:45, Tian, Kevin wrote:
>> From: Liu, Yi L
>> Sent: Monday, July 3, 2017 6:31 PM
>>
>> Hi Jean,
>>
>>
>>>
>>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header
>> file)
>>>>
>>>> struct iommu_tlb_invalidate {
>>>> 	__u32	scope;
>>>> /* pasid-selective invalidation described by @pasid */
>>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
>>>> /* address-selevtive invalidation described by (@vaddr, @size) */
>>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> 
> For VT-d above two flags are related. There is no method of flushing
> (@vaddr, @size) for all pasids, which doesn't make sense. address-
> selective invalidation is valid only for a given pasid. So it's not appropriate
> to put them in same level of scope definition at least for VT-d.

For ARM SMMU the "flush all by VA" operation is valid. Although it's
unclear at this point if we will ever allow that, it should probably stay
in the common format, if there is one.

>>>> 	__u32	flags;
>>>> /*  targets non-pasid mappings, @pasid is not valid */
>>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
>>>
>>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
>>> using a special mode where PASID 0 is reserved and any traffic without
>>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
>>> to invalidate that special context explicitly. But this means that
>>> invalidation packet targeted at that context will have "scope = PASID" and
>>> "flags = NO_PASID", which is utterly confusing.
>>>
>>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID
>> flag
>>> and just use PASID 0 to invalidate this context on ARM. I don't think
>>> other architectures would use the NO_PASID flag anyway, but might be
>> mistaken.
>>
>> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
>> so
>> we may work without it. But if other vendor want to issue non-PASID tagged
>> cache, then may encounter problem.
> 
> I'm worried about what's the criteria which attribute should be abstracted
> in common structure and which can be left to opaque. It doesn't make
> much sense to do such abstraction purely because different vendor formats
> have some common fields. Usually we do such abstraction because 
> vendor-agnostic code need to do some common handling before going to
> vendor specific code. However in this case VFIO is not expected to do anything
> with those IOMMU specific attributes. Then the structure is directly forwarded
> to IOMMU driver, which simply translates the structure into vendor specific
> opaque data again. Then why bothering to do double translations in Qemu
> and IOMMU driver side?>
> Take VT-d for example. Below is a summary of all possible selections around
> invalidation of 1st level structure for svm:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
> 	all mappings, or page-selective mappings (addr, size)
> invalidation target:
> 	IOTLB entries (leaf)
> 	paging structure cache (non-leaf)

I'm curious, can you invalidate all intermediate paging structures for a
given PASID without invalidating the leaves?

> 	PASID cache (pasid->cr3)
I guess any implementations that gives the whole PASID table to userspace
will need the PASID cache invalidation. This was missing from my proposal
since it was from virtio-iommu.

> invalidation hint:
> 	whether global pages are included
> 	drain reads/writes>
> Above are pretty architectural attributes if just looking at functional
> purpose. Then if we really consider defining a common structure, it
> might be more natural to define a superset of all vendors' capabilities
> and remove the opaque field at all. But as said earlier the purpose of
> doing such abstraction is not clear if there is no vendor-agnostic
> user actually digesting those fields. Then should we reconsider the
> full opaque approach?
> 
> Welcome comments since I may overlook something here. :-)

I guess on x86 the invalidation packet formats are stable, but for ARM I'm
reluctant to deal with vendor-specific formats at the API level, because
they tend to be volatile. If a virtual IOMMU version is different from the
physical one, then the page table format will be the same but invalidation
format will not.

So it would be good to define common fields that have the same effects
regardless on the underlying pIOMMU. And the fields that differ between
ARM and x86 seem to only be hints.

In addition on ARM SMMU, the guest cannot build an invalidation command
that the host could simply copy into the hardware command queue. The
pIOMMU driver needs to craft an invalidation command with a Virtual
Machine ID, that the guest is never aware of, and a separate ATS
invalidation command. It might also need to retrieve an Address Space ID
associated with the given PASID if it chose to hide it from the guest.

So for us the invalidation structure would always be different from the
hardware one. That's why I do not have any reason the prefer an opaque
structure in the first place, and defining generic fields looks much
neater :) Then again, I don't have any strong technical objection to it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-05 12:42                       ` Jean-Philippe Brucker
@ 2017-07-05 17:28                           ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-05 17:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On Wed, 5 Jul 2017 13:42:03 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 05/07/17 07:45, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Monday, July 3, 2017 6:31 PM
> >>
> >> Hi Jean,
> >>
> >>  
> >>>  
> >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header  
> >> file)  
> >>>>
> >>>> struct iommu_tlb_invalidate {
> >>>> 	__u32	scope;
> >>>> /* pasid-selective invalidation described by @pasid */
> >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)  
> > 
> > For VT-d above two flags are related. There is no method of flushing
> > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > selective invalidation is valid only for a given pasid. So it's not appropriate
> > to put them in same level of scope definition at least for VT-d.  
> 
> For ARM SMMU the "flush all by VA" operation is valid. Although it's
> unclear at this point if we will ever allow that, it should probably stay
> in the common format, if there is one.
> 
> >>>> 	__u32	flags;
> >>>> /*  targets non-pasid mappings, @pasid is not valid */
> >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)  
> >>>
> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> >>> using a special mode where PASID 0 is reserved and any traffic without
> >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> >>> to invalidate that special context explicitly. But this means that
> >>> invalidation packet targeted at that context will have "scope = PASID" and
> >>> "flags = NO_PASID", which is utterly confusing.
> >>>
> >>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID  
> >> flag  
> >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> >>> other architectures would use the NO_PASID flag anyway, but might be  
> >> mistaken.
> >>
> >> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
> >> so
> >> we may work without it. But if other vendor want to issue non-PASID tagged
> >> cache, then may encounter problem.  
> > 
> > I'm worried about what's the criteria which attribute should be abstracted
> > in common structure and which can be left to opaque. It doesn't make
> > much sense to do such abstraction purely because different vendor formats
> > have some common fields. Usually we do such abstraction because 
> > vendor-agnostic code need to do some common handling before going to
> > vendor specific code. However in this case VFIO is not expected to do anything
> > with those IOMMU specific attributes. Then the structure is directly forwarded
> > to IOMMU driver, which simply translates the structure into vendor specific
> > opaque data again. Then why bothering to do double translations in Qemu
> > and IOMMU driver side?>
> > Take VT-d for example. Below is a summary of all possible selections around
> > invalidation of 1st level structure for svm:
> > 
> > Scope: All PASIDs, single PASID
> > for each PASID:
> > 	all mappings, or page-selective mappings (addr, size)
> > invalidation target:
> > 	IOTLB entries (leaf)
> > 	paging structure cache (non-leaf)  
> 
> I'm curious, can you invalidate all intermediate paging structures for a
> given PASID without invalidating the leaves?
> 
> > 	PASID cache (pasid->cr3)  
> I guess any implementations that gives the whole PASID table to userspace
> will need the PASID cache invalidation. This was missing from my proposal
> since it was from virtio-iommu.
> 
> > invalidation hint:
> > 	whether global pages are included
> > 	drain reads/writes>
> > Above are pretty architectural attributes if just looking at functional
> > purpose. Then if we really consider defining a common structure, it
> > might be more natural to define a superset of all vendors' capabilities
> > and remove the opaque field at all. But as said earlier the purpose of
> > doing such abstraction is not clear if there is no vendor-agnostic
> > user actually digesting those fields. Then should we reconsider the
> > full opaque approach?
> > 
> > Welcome comments since I may overlook something here. :-)  
> 
> I guess on x86 the invalidation packet formats are stable, but for ARM I'm
> reluctant to deal with vendor-specific formats at the API level, because
> they tend to be volatile. If a virtual IOMMU version is different from the
> physical one, then the page table format will be the same but invalidation
> format will not.
> 
> So it would be good to define common fields that have the same effects
> regardless on the underlying pIOMMU. And the fields that differ between
> ARM and x86 seem to only be hints.
> 
> In addition on ARM SMMU, the guest cannot build an invalidation command
> that the host could simply copy into the hardware command queue. The
> pIOMMU driver needs to craft an invalidation command with a Virtual
> Machine ID, that the guest is never aware of, and a separate ATS
> invalidation command. It might also need to retrieve an Address Space ID
> associated with the given PASID if it chose to hide it from the guest.
> 
> So for us the invalidation structure would always be different from the
> hardware one. That's why I do not have any reason the prefer an opaque
> structure in the first place, and defining generic fields looks much
> neater :) Then again, I don't have any strong technical objection to it.

I have an objection to opaque data, it's not documented for users,
can't be considered a stable ABI, introduces compatibility issues, and
makes debugging difficult.  vfio should have the right to and the
ability to validate anything coming from the user, whether it's vendor
specific or generic.  Your concern about hardware changing is just as
valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-d
host, how do we know that the two are strictly compatible?  It may be
today, but we cannot predict the future.  A fully specified ABI means
that we can properly version it, and if necessary provide compatibility
handlers if the hardware changes.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05 17:28                           ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-05 17:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, Liu, Yi L, Lan, Tianyu, Liu, Yi L, Raj, Ashok, kvm,
	jasowang, Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun

On Wed, 5 Jul 2017 13:42:03 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 05/07/17 07:45, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Monday, July 3, 2017 6:31 PM
> >>
> >> Hi Jean,
> >>
> >>  
> >>>  
> >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header  
> >> file)  
> >>>>
> >>>> struct iommu_tlb_invalidate {
> >>>> 	__u32	scope;
> >>>> /* pasid-selective invalidation described by @pasid */
> >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)  
> > 
> > For VT-d above two flags are related. There is no method of flushing
> > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > selective invalidation is valid only for a given pasid. So it's not appropriate
> > to put them in same level of scope definition at least for VT-d.  
> 
> For ARM SMMU the "flush all by VA" operation is valid. Although it's
> unclear at this point if we will ever allow that, it should probably stay
> in the common format, if there is one.
> 
> >>>> 	__u32	flags;
> >>>> /*  targets non-pasid mappings, @pasid is not valid */
> >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)  
> >>>
> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> >>> using a special mode where PASID 0 is reserved and any traffic without
> >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> >>> to invalidate that special context explicitly. But this means that
> >>> invalidation packet targeted at that context will have "scope = PASID" and
> >>> "flags = NO_PASID", which is utterly confusing.
> >>>
> >>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID  
> >> flag  
> >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> >>> other architectures would use the NO_PASID flag anyway, but might be  
> >> mistaken.
> >>
> >> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
> >> so
> >> we may work without it. But if other vendor want to issue non-PASID tagged
> >> cache, then may encounter problem.  
> > 
> > I'm worried about what's the criteria which attribute should be abstracted
> > in common structure and which can be left to opaque. It doesn't make
> > much sense to do such abstraction purely because different vendor formats
> > have some common fields. Usually we do such abstraction because 
> > vendor-agnostic code need to do some common handling before going to
> > vendor specific code. However in this case VFIO is not expected to do anything
> > with those IOMMU specific attributes. Then the structure is directly forwarded
> > to IOMMU driver, which simply translates the structure into vendor specific
> > opaque data again. Then why bothering to do double translations in Qemu
> > and IOMMU driver side?>
> > Take VT-d for example. Below is a summary of all possible selections around
> > invalidation of 1st level structure for svm:
> > 
> > Scope: All PASIDs, single PASID
> > for each PASID:
> > 	all mappings, or page-selective mappings (addr, size)
> > invalidation target:
> > 	IOTLB entries (leaf)
> > 	paging structure cache (non-leaf)  
> 
> I'm curious, can you invalidate all intermediate paging structures for a
> given PASID without invalidating the leaves?
> 
> > 	PASID cache (pasid->cr3)  
> I guess any implementations that gives the whole PASID table to userspace
> will need the PASID cache invalidation. This was missing from my proposal
> since it was from virtio-iommu.
> 
> > invalidation hint:
> > 	whether global pages are included
> > 	drain reads/writes>
> > Above are pretty architectural attributes if just looking at functional
> > purpose. Then if we really consider defining a common structure, it
> > might be more natural to define a superset of all vendors' capabilities
> > and remove the opaque field at all. But as said earlier the purpose of
> > doing such abstraction is not clear if there is no vendor-agnostic
> > user actually digesting those fields. Then should we reconsider the
> > full opaque approach?
> > 
> > Welcome comments since I may overlook something here. :-)  
> 
> I guess on x86 the invalidation packet formats are stable, but for ARM I'm
> reluctant to deal with vendor-specific formats at the API level, because
> they tend to be volatile. If a virtual IOMMU version is different from the
> physical one, then the page table format will be the same but invalidation
> format will not.
> 
> So it would be good to define common fields that have the same effects
> regardless on the underlying pIOMMU. And the fields that differ between
> ARM and x86 seem to only be hints.
> 
> In addition on ARM SMMU, the guest cannot build an invalidation command
> that the host could simply copy into the hardware command queue. The
> pIOMMU driver needs to craft an invalidation command with a Virtual
> Machine ID, that the guest is never aware of, and a separate ATS
> invalidation command. It might also need to retrieve an Address Space ID
> associated with the given PASID if it chose to hide it from the guest.
> 
> So for us the invalidation structure would always be different from the
> hardware one. That's why I do not have any reason the prefer an opaque
> structure in the first place, and defining generic fields looks much
> neater :) Then again, I don't have any strong technical objection to it.

I have an objection to opaque data, it's not documented for users,
can't be considered a stable ABI, introduces compatibility issues, and
makes debugging difficult.  vfio should have the right to and the
ability to validate anything coming from the user, whether it's vendor
specific or generic.  Your concern about hardware changing is just as
valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-d
host, how do we know that the two are strictly compatible?  It may be
today, but we cannot predict the future.  A fully specified ABI means
that we can properly version it, and if necessary provide compatibility
handlers if the hardware changes.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-05 17:28                           ` Alex Williamson
@ 2017-07-05 22:26                               ` Tian, Kevin
  -1 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05 22:26 UTC (permalink / raw)
  To: Alex Williamson, Jean-Philippe Brucker
  Cc: Lan, Tianyu, Liu, Yi L, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Thursday, July 6, 2017 1:28 AM
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> > >> file)
> > >>>>
> > >>>> struct iommu_tlb_invalidate {
> > >>>> 	__u32	scope;
> > >>>> /* pasid-selective invalidation described by @pasid */
> > >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> > >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not appropriate
> > > to put them in same level of scope definition at least for VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably stay
> > in the common format, if there is one.
> >
> > >>>> 	__u32	flags;
> > >>>> /*  targets non-pasid mappings, @pasid is not valid */
> > >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > >>>
> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > >>> using a special mode where PASID 0 is reserved and any traffic without
> > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID"
> flag
> > >>> to invalidate that special context explicitly. But this means that
> > >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> > >>> "flags = NO_PASID", which is utterly confusing.
> > >>>
> > >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> > >> flag
> > >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> > >>> other architectures would use the NO_PASID flag anyway, but might be
> > >> mistaken.
> > >>
> > >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> > >> so
> > >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> > >> cache, then may encounter problem.
> > >
> > > I'm worried about what's the criteria which attribute should be
> abstracted
> > > in common structure and which can be left to opaque. It doesn't make
> > > much sense to do such abstraction purely because different vendor
> formats
> > > have some common fields. Usually we do such abstraction because
> > > vendor-agnostic code need to do some common handling before going to
> > > vendor specific code. However in this case VFIO is not expected to do
> anything
> > > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > > to IOMMU driver, which simply translates the structure into vendor
> specific
> > > opaque data again. Then why bothering to do double translations in
> Qemu
> > > and IOMMU driver side?>
> > > Take VT-d for example. Below is a summary of all possible selections
> around
> > > invalidation of 1st level structure for svm:
> > >
> > > Scope: All PASIDs, single PASID
> > > for each PASID:
> > > 	all mappings, or page-selective mappings (addr, size)
> > > invalidation target:
> > > 	IOTLB entries (leaf)
> > > 	paging structure cache (non-leaf)
> >
> > I'm curious, can you invalidate all intermediate paging structures for a
> > given PASID without invalidating the leaves?
> >
> > > 	PASID cache (pasid->cr3)
> > I guess any implementations that gives the whole PASID table to userspace
> > will need the PASID cache invalidation. This was missing from my proposal
> > since it was from virtio-iommu.
> >
> > > invalidation hint:
> > > 	whether global pages are included
> > > 	drain reads/writes>
> > > Above are pretty architectural attributes if just looking at functional
> > > purpose. Then if we really consider defining a common structure, it
> > > might be more natural to define a superset of all vendors' capabilities
> > > and remove the opaque field at all. But as said earlier the purpose of
> > > doing such abstraction is not clear if there is no vendor-agnostic
> > > user actually digesting those fields. Then should we reconsider the
> > > full opaque approach?
> > >
> > > Welcome comments since I may overlook something here. :-)
> >
> > I guess on x86 the invalidation packet formats are stable, but for ARM I'm
> > reluctant to deal with vendor-specific formats at the API level, because
> > they tend to be volatile. If a virtual IOMMU version is different from the
> > physical one, then the page table format will be the same but invalidation
> > format will not.
> >
> > So it would be good to define common fields that have the same effects
> > regardless on the underlying pIOMMU. And the fields that differ between
> > ARM and x86 seem to only be hints.
> >
> > In addition on ARM SMMU, the guest cannot build an invalidation
> command
> > that the host could simply copy into the hardware command queue. The
> > pIOMMU driver needs to craft an invalidation command with a Virtual
> > Machine ID, that the guest is never aware of, and a separate ATS
> > invalidation command. It might also need to retrieve an Address Space ID
> > associated with the given PASID if it chose to hide it from the guest.
> >
> > So for us the invalidation structure would always be different from the
> > hardware one. That's why I do not have any reason the prefer an opaque
> > structure in the first place, and defining generic fields looks much
> > neater :) Then again, I don't have any strong technical objection to it.
> 
> I have an objection to opaque data, it's not documented for users,
> can't be considered a stable ABI, introduces compatibility issues, and

So far there are three options discussed:

1) full opaque
2) partial specified + opaque
3) fully specified

I take your objection as going 3), since as long as there is still
opaque the same ABI compatibility issue still exist, right? Then
even vendor specific capabilities will be explicitly defined in
this structure, and then we may also need a query interface
to know which capabilities are supported underlying.

> makes debugging difficult.  vfio should have the right to and the
> ability to validate anything coming from the user, whether it's vendor

this part I didn't quite get. Which part in such structure will be
validated by VFIO? From current description, they are all IOMMU
specific knowledge.

> specific or generic.  Your concern about hardware changing is just as
> valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-d
> host, how do we know that the two are strictly compatible?  It may be

It makes sense

> today, but we cannot predict the future.  A fully specified ABI means
> that we can properly version it, and if necessary provide compatibility
> handlers if the hardware changes.  Thanks,
> 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05 22:26                               ` Tian, Kevin
  0 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05 22:26 UTC (permalink / raw)
  To: Alex Williamson, Jean-Philippe Brucker
  Cc: Liu, Yi L, Lan, Tianyu, Liu, Yi L, Raj, Ashok, kvm, jasowang,
	Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, July 6, 2017 1:28 AM
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> > >> file)
> > >>>>
> > >>>> struct iommu_tlb_invalidate {
> > >>>> 	__u32	scope;
> > >>>> /* pasid-selective invalidation described by @pasid */
> > >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> > >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not appropriate
> > > to put them in same level of scope definition at least for VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably stay
> > in the common format, if there is one.
> >
> > >>>> 	__u32	flags;
> > >>>> /*  targets non-pasid mappings, @pasid is not valid */
> > >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > >>>
> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > >>> using a special mode where PASID 0 is reserved and any traffic without
> > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID"
> flag
> > >>> to invalidate that special context explicitly. But this means that
> > >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> > >>> "flags = NO_PASID", which is utterly confusing.
> > >>>
> > >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> > >> flag
> > >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> > >>> other architectures would use the NO_PASID flag anyway, but might be
> > >> mistaken.
> > >>
> > >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> > >> so
> > >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> > >> cache, then may encounter problem.
> > >
> > > I'm worried about what's the criteria which attribute should be
> abstracted
> > > in common structure and which can be left to opaque. It doesn't make
> > > much sense to do such abstraction purely because different vendor
> formats
> > > have some common fields. Usually we do such abstraction because
> > > vendor-agnostic code need to do some common handling before going to
> > > vendor specific code. However in this case VFIO is not expected to do
> anything
> > > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > > to IOMMU driver, which simply translates the structure into vendor
> specific
> > > opaque data again. Then why bothering to do double translations in
> Qemu
> > > and IOMMU driver side?>
> > > Take VT-d for example. Below is a summary of all possible selections
> around
> > > invalidation of 1st level structure for svm:
> > >
> > > Scope: All PASIDs, single PASID
> > > for each PASID:
> > > 	all mappings, or page-selective mappings (addr, size)
> > > invalidation target:
> > > 	IOTLB entries (leaf)
> > > 	paging structure cache (non-leaf)
> >
> > I'm curious, can you invalidate all intermediate paging structures for a
> > given PASID without invalidating the leaves?
> >
> > > 	PASID cache (pasid->cr3)
> > I guess any implementations that gives the whole PASID table to userspace
> > will need the PASID cache invalidation. This was missing from my proposal
> > since it was from virtio-iommu.
> >
> > > invalidation hint:
> > > 	whether global pages are included
> > > 	drain reads/writes>
> > > Above are pretty architectural attributes if just looking at functional
> > > purpose. Then if we really consider defining a common structure, it
> > > might be more natural to define a superset of all vendors' capabilities
> > > and remove the opaque field at all. But as said earlier the purpose of
> > > doing such abstraction is not clear if there is no vendor-agnostic
> > > user actually digesting those fields. Then should we reconsider the
> > > full opaque approach?
> > >
> > > Welcome comments since I may overlook something here. :-)
> >
> > I guess on x86 the invalidation packet formats are stable, but for ARM I'm
> > reluctant to deal with vendor-specific formats at the API level, because
> > they tend to be volatile. If a virtual IOMMU version is different from the
> > physical one, then the page table format will be the same but invalidation
> > format will not.
> >
> > So it would be good to define common fields that have the same effects
> > regardless on the underlying pIOMMU. And the fields that differ between
> > ARM and x86 seem to only be hints.
> >
> > In addition on ARM SMMU, the guest cannot build an invalidation
> command
> > that the host could simply copy into the hardware command queue. The
> > pIOMMU driver needs to craft an invalidation command with a Virtual
> > Machine ID, that the guest is never aware of, and a separate ATS
> > invalidation command. It might also need to retrieve an Address Space ID
> > associated with the given PASID if it chose to hide it from the guest.
> >
> > So for us the invalidation structure would always be different from the
> > hardware one. That's why I do not have any reason the prefer an opaque
> > structure in the first place, and defining generic fields looks much
> > neater :) Then again, I don't have any strong technical objection to it.
> 
> I have an objection to opaque data, it's not documented for users,
> can't be considered a stable ABI, introduces compatibility issues, and

So far there are three options discussed:

1) full opaque
2) partial specified + opaque
3) fully specified

I take your objection as going 3), since as long as there is still
opaque the same ABI compatibility issue still exist, right? Then
even vendor specific capabilities will be explicitly defined in
this structure, and then we may also need a query interface
to know which capabilities are supported underlying.

> makes debugging difficult.  vfio should have the right to and the
> ability to validate anything coming from the user, whether it's vendor

this part I didn't quite get. Which part in such structure will be
validated by VFIO? From current description, they are all IOMMU
specific knowledge.

> specific or generic.  Your concern about hardware changing is just as
> valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-d
> host, how do we know that the two are strictly compatible?  It may be

It makes sense

> today, but we cannot predict the future.  A fully specified ABI means
> that we can properly version it, and if necessary provide compatibility
> handlers if the hardware changes.  Thanks,
> 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-05 12:42                       ` Jean-Philippe Brucker
@ 2017-07-05 22:31                           ` Tian, Kevin
  -1 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05 22:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Liu, Yi L
  Cc: Lan, Tianyu, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

> From: Jean-Philippe Brucker
> Sent: Wednesday, July 5, 2017 8:42 PM
> 
> On 05/07/17 07:45, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Monday, July 3, 2017 6:31 PM
> >>
> >> Hi Jean,
> >>
> >>
> >>>
> >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> >> file)
> >>>>
> >>>> struct iommu_tlb_invalidate {
> >>>> 	__u32	scope;
> >>>> /* pasid-selective invalidation described by @pasid */
> >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> >
> > For VT-d above two flags are related. There is no method of flushing
> > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > selective invalidation is valid only for a given pasid. So it's not appropriate
> > to put them in same level of scope definition at least for VT-d.
> 
> For ARM SMMU the "flush all by VA" operation is valid. Although it's
> unclear at this point if we will ever allow that, it should probably stay
> in the common format, if there is one.

fine in common format. earlier I was thinking whether it should
be in scope. possibly fine after another thinking. :-)

> 
> >>>> 	__u32	flags;
> >>>> /*  targets non-pasid mappings, @pasid is not valid */
> >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> >>>
> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> >>> using a special mode where PASID 0 is reserved and any traffic without
> >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> >>> to invalidate that special context explicitly. But this means that
> >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> >>> "flags = NO_PASID", which is utterly confusing.
> >>>
> >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> >> flag
> >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> >>> other architectures would use the NO_PASID flag anyway, but might be
> >> mistaken.
> >>
> >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> >> so
> >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> >> cache, then may encounter problem.
> >
> > I'm worried about what's the criteria which attribute should be abstracted
> > in common structure and which can be left to opaque. It doesn't make
> > much sense to do such abstraction purely because different vendor
> formats
> > have some common fields. Usually we do such abstraction because
> > vendor-agnostic code need to do some common handling before going to
> > vendor specific code. However in this case VFIO is not expected to do
> anything
> > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > to IOMMU driver, which simply translates the structure into vendor specific
> > opaque data again. Then why bothering to do double translations in Qemu
> > and IOMMU driver side?>
> > Take VT-d for example. Below is a summary of all possible selections
> around
> > invalidation of 1st level structure for svm:
> >
> > Scope: All PASIDs, single PASID
> > for each PASID:
> > 	all mappings, or page-selective mappings (addr, size)
> > invalidation target:
> > 	IOTLB entries (leaf)
> > 	paging structure cache (non-leaf)
> 
> I'm curious, can you invalidate all intermediate paging structures for a
> given PASID without invalidating the leaves?

I don't think so. usually IOTLB flush is the base. one can further
specify whether flush should apply to non-leaves.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-05 22:31                           ` Tian, Kevin
  0 siblings, 0 replies; 116+ messages in thread
From: Tian, Kevin @ 2017-07-05 22:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Liu, Yi L
  Cc: Lan, Tianyu, Liu, Yi L, Raj, Ashok, kvm, jasowang, Will Deacon,
	alex.williamson, peterx, qemu-devel, iommu, Pan, Jacob jun

> From: Jean-Philippe Brucker
> Sent: Wednesday, July 5, 2017 8:42 PM
> 
> On 05/07/17 07:45, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Monday, July 3, 2017 6:31 PM
> >>
> >> Hi Jean,
> >>
> >>
> >>>
> >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> header
> >> file)
> >>>>
> >>>> struct iommu_tlb_invalidate {
> >>>> 	__u32	scope;
> >>>> /* pasid-selective invalidation described by @pasid */
> >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> >
> > For VT-d above two flags are related. There is no method of flushing
> > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > selective invalidation is valid only for a given pasid. So it's not appropriate
> > to put them in same level of scope definition at least for VT-d.
> 
> For ARM SMMU the "flush all by VA" operation is valid. Although it's
> unclear at this point if we will ever allow that, it should probably stay
> in the common format, if there is one.

fine in common format. earlier I was thinking whether it should
be in scope. possibly fine after another thinking. :-)

> 
> >>>> 	__u32	flags;
> >>>> /*  targets non-pasid mappings, @pasid is not valid */
> >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> >>>
> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> >>> using a special mode where PASID 0 is reserved and any traffic without
> >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> >>> to invalidate that special context explicitly. But this means that
> >>> invalidation packet targeted at that context will have "scope = PASID"
> and
> >>> "flags = NO_PASID", which is utterly confusing.
> >>>
> >>> I now think that we should get rid of the
> IOMMU_INVALIDATE_NO_PASID
> >> flag
> >>> and just use PASID 0 to invalidate this context on ARM. I don't think
> >>> other architectures would use the NO_PASID flag anyway, but might be
> >> mistaken.
> >>
> >> I may suggest to keep it so far. On VT-d, we may pass some data in
> opaque,
> >> so
> >> we may work without it. But if other vendor want to issue non-PASID
> tagged
> >> cache, then may encounter problem.
> >
> > I'm worried about what's the criteria which attribute should be abstracted
> > in common structure and which can be left to opaque. It doesn't make
> > much sense to do such abstraction purely because different vendor
> formats
> > have some common fields. Usually we do such abstraction because
> > vendor-agnostic code need to do some common handling before going to
> > vendor specific code. However in this case VFIO is not expected to do
> anything
> > with those IOMMU specific attributes. Then the structure is directly
> forwarded
> > to IOMMU driver, which simply translates the structure into vendor specific
> > opaque data again. Then why bothering to do double translations in Qemu
> > and IOMMU driver side?>
> > Take VT-d for example. Below is a summary of all possible selections
> around
> > invalidation of 1st level structure for svm:
> >
> > Scope: All PASIDs, single PASID
> > for each PASID:
> > 	all mappings, or page-selective mappings (addr, size)
> > invalidation target:
> > 	IOTLB entries (leaf)
> > 	paging structure cache (non-leaf)
> 
> I'm curious, can you invalidate all intermediate paging structures for a
> given PASID without invalidating the leaves?

I don't think so. usually IOTLB flush is the base. one can further
specify whether flush should apply to non-leaves.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-05 17:28                           ` Alex Williamson
@ 2017-07-14  8:58                               ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-14  8:58 UTC (permalink / raw)
  To: Alex Williamson, Jean-Philippe Brucker
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

Hi Alex,

Against to the opaque open, I'd like to propose the following definition
based on the existing comments. Pls note that I've merged the pasid
table binding and iommu tlb invalidation into a single IOCTL and make
different flags to indicate the iommu operations. Per Kevin's comments,
there may be iommu invalidation for guest IOVA tlb, so I renamed the
IOCTL and data structure to be non-svm specific. Pls kindly have a review,
so that we can make the opaque open closed and move forward. Surely,
comments and ideas are welcomed. And for the scope and flags definition
in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it.

1. Add a VFIO IOCTL for iommu operations from user-space

#define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)

Corresponding data structure:
struct vfio_iommu_operation_info {
	__u32	argsz;
#define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
#define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver*/
#define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
#define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
	__u32	flag;
	__u32	length; // length of the data[] part in byte
	__u8	data[]; // stores the data for iommu op indicated by flag field
};

For iommu tlb invalidation from userspace, the "__u8 data[]" stores
data which would be parsed by the "struct iommu_tlb_invalidate" defined
below.

2. Definitions in include/uapi/linux/iommu.h(newly added header file)

/* IOMMU model definition for iommu operations from userspace */
enum iommu_model {
	INTLE_IOMMU,
	ARM_SMMU,
	AMD_IOMMU,
	SPAPR_IOMMU,
	S390_IOMMU,
};

struct iommu_tlb_invalidate {
	__u32	scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID	(1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR	(1 << 1)
	__u32	flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
	all intermediate tables cached as part of the PTE for
	vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	enum iommu_model model;
	/*
	 Vendor may have different HW version and thus the
	 data part of this structure differs, use sub_version
	 to indicate such difference.
	 */
	__u322 sub_version;
	__u64 length; // length of the data[] part in byte
	__u8	data[];
};

For Intel, the data structue is:
struct intel_iommu_invalidate_data {
	__u64 low;
	__u64 high;
}

Thanks,
Yi L

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Thursday, July 6, 2017 1:28 AM
> To: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> Cc: Tian, Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>; Lan,
> Tianyu <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Raj, Ashok
> <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Will Deacon
> <Will.Deacon-5wv7dgnIgG8@public.gmane.org>; peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Pan, Jacob jun <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> > >>>> header
> > >> file)
> > >>>>
> > >>>> struct iommu_tlb_invalidate {
> > >>>> 	__u32	scope;
> > >>>> /* pasid-selective invalidation described by @pasid */
> > >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> > >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not
> > > appropriate to put them in same level of scope definition at least for VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably
> > stay in the common format, if there is one.
> >
> > >>>> 	__u32	flags;
> > >>>> /*  targets non-pasid mappings, @pasid is not valid */
> > >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > >>>
> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU,
> > >>> we're using a special mode where PASID 0 is reserved and any
> > >>> traffic without PASID uses entry 0 of the PASID table. So I
> > >>> proposed the "NO_PASID" flag to invalidate that special context
> > >>> explicitly. But this means that invalidation packet targeted at
> > >>> that context will have "scope = PASID" and "flags = NO_PASID", which is utterly
> confusing.
> > >>>
> > >>> I now think that we should get rid of the
> > >>> IOMMU_INVALIDATE_NO_PASID
> > >> flag
> > >>> and just use PASID 0 to invalidate this context on ARM. I don't
> > >>> think other architectures would use the NO_PASID flag anyway, but
> > >>> might be
> > >> mistaken.
> > >>
> > >> I may suggest to keep it so far. On VT-d, we may pass some data in
> > >> opaque, so we may work without it. But if other vendor want to
> > >> issue non-PASID tagged cache, then may encounter problem.
> > >
> > > I'm worried about what's the criteria which attribute should be
> > > abstracted in common structure and which can be left to opaque. It
> > > doesn't make much sense to do such abstraction purely because
> > > different vendor formats have some common fields. Usually we do such
> > > abstraction because vendor-agnostic code need to do some common
> > > handling before going to vendor specific code. However in this case
> > > VFIO is not expected to do anything with those IOMMU specific
> > > attributes. Then the structure is directly forwarded to IOMMU
> > > driver, which simply translates the structure into vendor specific
> > > opaque data again. Then why bothering to do double translations in
> > > Qemu and IOMMU driver side?> Take VT-d for example. Below is a
> > > summary of all possible selections around invalidation of 1st level structure for
> svm:
> > >
> > > Scope: All PASIDs, single PASID
> > > for each PASID:
> > > 	all mappings, or page-selective mappings (addr, size) invalidation
> > > target:
> > > 	IOTLB entries (leaf)
> > > 	paging structure cache (non-leaf)
> >
> > I'm curious, can you invalidate all intermediate paging structures for
> > a given PASID without invalidating the leaves?
> >
> > > 	PASID cache (pasid->cr3)
> > I guess any implementations that gives the whole PASID table to
> > userspace will need the PASID cache invalidation. This was missing
> > from my proposal since it was from virtio-iommu.
> >
> > > invalidation hint:
> > > 	whether global pages are included
> > > 	drain reads/writes>
> > > Above are pretty architectural attributes if just looking at
> > > functional purpose. Then if we really consider defining a common
> > > structure, it might be more natural to define a superset of all
> > > vendors' capabilities and remove the opaque field at all. But as
> > > said earlier the purpose of doing such abstraction is not clear if
> > > there is no vendor-agnostic user actually digesting those fields.
> > > Then should we reconsider the full opaque approach?
> > >
> > > Welcome comments since I may overlook something here. :-)
> >
> > I guess on x86 the invalidation packet formats are stable, but for ARM
> > I'm reluctant to deal with vendor-specific formats at the API level,
> > because they tend to be volatile. If a virtual IOMMU version is
> > different from the physical one, then the page table format will be
> > the same but invalidation format will not.
> >
> > So it would be good to define common fields that have the same effects
> > regardless on the underlying pIOMMU. And the fields that differ
> > between ARM and x86 seem to only be hints.
> >
> > In addition on ARM SMMU, the guest cannot build an invalidation
> > command that the host could simply copy into the hardware command
> > queue. The pIOMMU driver needs to craft an invalidation command with a
> > Virtual Machine ID, that the guest is never aware of, and a separate
> > ATS invalidation command. It might also need to retrieve an Address
> > Space ID associated with the given PASID if it chose to hide it from the guest.
> >
> > So for us the invalidation structure would always be different from
> > the hardware one. That's why I do not have any reason the prefer an
> > opaque structure in the first place, and defining generic fields looks
> > much neater :) Then again, I don't have any strong technical objection to it.
> 
> I have an objection to opaque data, it's not documented for users, can't be
> considered a stable ABI, introduces compatibility issues, and makes debugging
> difficult.  vfio should have the right to and the ability to validate anything coming
> from the user, whether it's vendor specific or generic.  Your concern about hardware
> changing is just as valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-
> d host, how do we know that the two are strictly compatible?  It may be today, but
> we cannot predict the future.  A fully specified ABI means that we can properly
> version it, and if necessary provide compatibility handlers if the hardware changes.
> Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-14  8:58                               ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-14  8:58 UTC (permalink / raw)
  To: Alex Williamson, Jean-Philippe Brucker
  Cc: Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj, Ashok, kvm, jasowang,
	Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun,
	Joerg Roedel

Hi Alex,

Against to the opaque open, I'd like to propose the following definition
based on the existing comments. Pls note that I've merged the pasid
table binding and iommu tlb invalidation into a single IOCTL and make
different flags to indicate the iommu operations. Per Kevin's comments,
there may be iommu invalidation for guest IOVA tlb, so I renamed the
IOCTL and data structure to be non-svm specific. Pls kindly have a review,
so that we can make the opaque open closed and move forward. Surely,
comments and ideas are welcomed. And for the scope and flags definition
in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it.

1. Add a VFIO IOCTL for iommu operations from user-space

#define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)

Corresponding data structure:
struct vfio_iommu_operation_info {
	__u32	argsz;
#define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
#define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver*/
#define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
#define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
	__u32	flag;
	__u32	length; // length of the data[] part in byte
	__u8	data[]; // stores the data for iommu op indicated by flag field
};

For iommu tlb invalidation from userspace, the "__u8 data[]" stores
data which would be parsed by the "struct iommu_tlb_invalidate" defined
below.

2. Definitions in include/uapi/linux/iommu.h(newly added header file)

/* IOMMU model definition for iommu operations from userspace */
enum iommu_model {
	INTLE_IOMMU,
	ARM_SMMU,
	AMD_IOMMU,
	SPAPR_IOMMU,
	S390_IOMMU,
};

struct iommu_tlb_invalidate {
	__u32	scope;
/* pasid-selective invalidation described by @pasid */
#define IOMMU_INVALIDATE_PASID	(1 << 0)
/* address-selevtive invalidation described by (@vaddr, @size) */
#define IOMMU_INVALIDATE_VADDR	(1 << 1)
	__u32	flags;
/*  targets non-pasid mappings, @pasid is not valid */
#define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
/* indicating that the pIOMMU doesn't need to invalidate
	all intermediate tables cached as part of the PTE for
	vaddr, only the last-level entry (pte). This is a hint. */
#define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
	__u32	pasid;
	__u64	vaddr;
	__u64	size;
	enum iommu_model model;
	/*
	 Vendor may have different HW version and thus the
	 data part of this structure differs, use sub_version
	 to indicate such difference.
	 */
	__u322 sub_version;
	__u64 length; // length of the data[] part in byte
	__u8	data[];
};

For Intel, the data structue is:
struct intel_iommu_invalidate_data {
	__u64 low;
	__u64 high;
}

Thanks,
Yi L

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, July 6, 2017 1:28 AM
> To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan,
> Tianyu <tianyu.lan@intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Raj, Ashok
> <ashok.raj@intel.com>; kvm@vger.kernel.org; jasowang@redhat.com; Will Deacon
> <Will.Deacon@arm.com>; peterx@redhat.com; qemu-devel@nongnu.org;
> iommu@lists.linux-foundation.org; Pan, Jacob jun <jacob.jun.pan@intel.com>
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Wed, 5 Jul 2017 13:42:03 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
> > On 05/07/17 07:45, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Monday, July 3, 2017 6:31 PM
> > >>
> > >> Hi Jean,
> > >>
> > >>
> > >>>
> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added
> > >>>> header
> > >> file)
> > >>>>
> > >>>> struct iommu_tlb_invalidate {
> > >>>> 	__u32	scope;
> > >>>> /* pasid-selective invalidation described by @pasid */
> > >>>> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */
> > >>>> #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > >
> > > For VT-d above two flags are related. There is no method of flushing
> > > (@vaddr, @size) for all pasids, which doesn't make sense. address-
> > > selective invalidation is valid only for a given pasid. So it's not
> > > appropriate to put them in same level of scope definition at least for VT-d.
> >
> > For ARM SMMU the "flush all by VA" operation is valid. Although it's
> > unclear at this point if we will ever allow that, it should probably
> > stay in the common format, if there is one.
> >
> > >>>> 	__u32	flags;
> > >>>> /*  targets non-pasid mappings, @pasid is not valid */
> > >>>> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > >>>
> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU,
> > >>> we're using a special mode where PASID 0 is reserved and any
> > >>> traffic without PASID uses entry 0 of the PASID table. So I
> > >>> proposed the "NO_PASID" flag to invalidate that special context
> > >>> explicitly. But this means that invalidation packet targeted at
> > >>> that context will have "scope = PASID" and "flags = NO_PASID", which is utterly
> confusing.
> > >>>
> > >>> I now think that we should get rid of the
> > >>> IOMMU_INVALIDATE_NO_PASID
> > >> flag
> > >>> and just use PASID 0 to invalidate this context on ARM. I don't
> > >>> think other architectures would use the NO_PASID flag anyway, but
> > >>> might be
> > >> mistaken.
> > >>
> > >> I may suggest to keep it so far. On VT-d, we may pass some data in
> > >> opaque, so we may work without it. But if other vendor want to
> > >> issue non-PASID tagged cache, then may encounter problem.
> > >
> > > I'm worried about what's the criteria which attribute should be
> > > abstracted in common structure and which can be left to opaque. It
> > > doesn't make much sense to do such abstraction purely because
> > > different vendor formats have some common fields. Usually we do such
> > > abstraction because vendor-agnostic code need to do some common
> > > handling before going to vendor specific code. However in this case
> > > VFIO is not expected to do anything with those IOMMU specific
> > > attributes. Then the structure is directly forwarded to IOMMU
> > > driver, which simply translates the structure into vendor specific
> > > opaque data again. Then why bothering to do double translations in
> > > Qemu and IOMMU driver side?> Take VT-d for example. Below is a
> > > summary of all possible selections around invalidation of 1st level structure for
> svm:
> > >
> > > Scope: All PASIDs, single PASID
> > > for each PASID:
> > > 	all mappings, or page-selective mappings (addr, size) invalidation
> > > target:
> > > 	IOTLB entries (leaf)
> > > 	paging structure cache (non-leaf)
> >
> > I'm curious, can you invalidate all intermediate paging structures for
> > a given PASID without invalidating the leaves?
> >
> > > 	PASID cache (pasid->cr3)
> > I guess any implementations that gives the whole PASID table to
> > userspace will need the PASID cache invalidation. This was missing
> > from my proposal since it was from virtio-iommu.
> >
> > > invalidation hint:
> > > 	whether global pages are included
> > > 	drain reads/writes>
> > > Above are pretty architectural attributes if just looking at
> > > functional purpose. Then if we really consider defining a common
> > > structure, it might be more natural to define a superset of all
> > > vendors' capabilities and remove the opaque field at all. But as
> > > said earlier the purpose of doing such abstraction is not clear if
> > > there is no vendor-agnostic user actually digesting those fields.
> > > Then should we reconsider the full opaque approach?
> > >
> > > Welcome comments since I may overlook something here. :-)
> >
> > I guess on x86 the invalidation packet formats are stable, but for ARM
> > I'm reluctant to deal with vendor-specific formats at the API level,
> > because they tend to be volatile. If a virtual IOMMU version is
> > different from the physical one, then the page table format will be
> > the same but invalidation format will not.
> >
> > So it would be good to define common fields that have the same effects
> > regardless on the underlying pIOMMU. And the fields that differ
> > between ARM and x86 seem to only be hints.
> >
> > In addition on ARM SMMU, the guest cannot build an invalidation
> > command that the host could simply copy into the hardware command
> > queue. The pIOMMU driver needs to craft an invalidation command with a
> > Virtual Machine ID, that the guest is never aware of, and a separate
> > ATS invalidation command. It might also need to retrieve an Address
> > Space ID associated with the given PASID if it chose to hide it from the guest.
> >
> > So for us the invalidation structure would always be different from
> > the hardware one. That's why I do not have any reason the prefer an
> > opaque structure in the first place, and defining generic fields looks
> > much neater :) Then again, I don't have any strong technical objection to it.
> 
> I have an objection to opaque data, it's not documented for users, can't be
> considered a stable ABI, introduces compatibility issues, and makes debugging
> difficult.  vfio should have the right to and the ability to validate anything coming
> from the user, whether it's vendor specific or generic.  Your concern about hardware
> changing is just as valid on VT-d.  Even if we're emulating VT-d in userspace on a VT-
> d host, how do we know that the two are strictly compatible?  It may be today, but
> we cannot predict the future.  A fully specified ABI means that we can properly
> version it, and if necessary provide compatibility handlers if the hardware changes.
> Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-14  8:58                               ` Liu, Yi L
@ 2017-07-14 18:15                                   ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-14 18:15 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On Fri, 14 Jul 2017 08:58:02 +0000
"Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> Hi Alex,
> 
> Against to the opaque open, I'd like to propose the following definition
> based on the existing comments. Pls note that I've merged the pasid
> table binding and iommu tlb invalidation into a single IOCTL and make
> different flags to indicate the iommu operations. Per Kevin's comments,
> there may be iommu invalidation for guest IOVA tlb, so I renamed the
> IOCTL and data structure to be non-svm specific. Pls kindly have a review,
> so that we can make the opaque open closed and move forward. Surely,
> comments and ideas are welcomed. And for the scope and flags definition
> in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it.
> 
> 1. Add a VFIO IOCTL for iommu operations from user-space
> 
> #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> 
> Corresponding data structure:
> struct vfio_iommu_operation_info {
> 	__u32	argsz;
> #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver*/
> #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> 	__u32	flag;
> 	__u32	length; // length of the data[] part in byte
> 	__u8	data[]; // stores the data for iommu op indicated by flag field
> };

If we're doing a generic "Ops" ioctl, then we should have an "op" field
which is defined by an enum.  It doesn't make sense to use flags for
this, for example can we set multiple flag bits?  If not then it's not
a good use for a bit field.  I'm also not sure I understand the value
of the "length" field, can't it always be calculated from argsz?

> For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> data which would be parsed by the "struct iommu_tlb_invalidate" defined
> below.
> 
> 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> 
> /* IOMMU model definition for iommu operations from userspace */
> enum iommu_model {
> 	INTLE_IOMMU,
> 	ARM_SMMU,
> 	AMD_IOMMU,
> 	SPAPR_IOMMU,
> 	S390_IOMMU,
> };
> 
> struct iommu_tlb_invalidate {
> 	__u32	scope;
> /* pasid-selective invalidation described by @pasid */
> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> /* address-selevtive invalidation described by (@vaddr, @size) */
> #define IOMMU_INVALIDATE_VADDR	(1 << 1)

Again, is a bit field appropriate here, can a user set both bits?

> 	__u32	flags;
> /*  targets non-pasid mappings, @pasid is not valid */
> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> /* indicating that the pIOMMU doesn't need to invalidate
> 	all intermediate tables cached as part of the PTE for
> 	vaddr, only the last-level entry (pte). This is a hint. */
> #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)

Are we venturing into vendor specific attributes here?

> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	enum iommu_model model;

How does a user learn which model(s) are supported by the interface?
How do they learn which ops are supported?  Perhaps a good use for one
of those flag bits in the outer data structure is "probe".

> 	/*
> 	 Vendor may have different HW version and thus the
> 	 data part of this structure differs, use sub_version
> 	 to indicate such difference.
> 	 */
> 	__u322 sub_version;

Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in
the model enum.

> 	__u64 length; // length of the data[] part in byte

Questionably useful vs calculating from argsz again , but it certainly
doesn't need to be a qword :-o

> 	__u8	data[];
> };
> 
> For Intel, the data structue is:
> struct intel_iommu_invalidate_data {
> 	__u64 low;
> 	__u64 high;
> }

high/low what?  This is a pretty weak uapi definition.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-14 18:15                                   ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-14 18:15 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jean-Philippe Brucker, Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj,
	Ashok, kvm, jasowang, Will Deacon, peterx, qemu-devel, iommu,
	Pan, Jacob jun, Joerg Roedel

On Fri, 14 Jul 2017 08:58:02 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> Against to the opaque open, I'd like to propose the following definition
> based on the existing comments. Pls note that I've merged the pasid
> table binding and iommu tlb invalidation into a single IOCTL and make
> different flags to indicate the iommu operations. Per Kevin's comments,
> there may be iommu invalidation for guest IOVA tlb, so I renamed the
> IOCTL and data structure to be non-svm specific. Pls kindly have a review,
> so that we can make the opaque open closed and move forward. Surely,
> comments and ideas are welcomed. And for the scope and flags definition
> in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it.
> 
> 1. Add a VFIO IOCTL for iommu operations from user-space
> 
> #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> 
> Corresponding data structure:
> struct vfio_iommu_operation_info {
> 	__u32	argsz;
> #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver*/
> #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> 	__u32	flag;
> 	__u32	length; // length of the data[] part in byte
> 	__u8	data[]; // stores the data for iommu op indicated by flag field
> };

If we're doing a generic "Ops" ioctl, then we should have an "op" field
which is defined by an enum.  It doesn't make sense to use flags for
this, for example can we set multiple flag bits?  If not then it's not
a good use for a bit field.  I'm also not sure I understand the value
of the "length" field, can't it always be calculated from argsz?

> For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> data which would be parsed by the "struct iommu_tlb_invalidate" defined
> below.
> 
> 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> 
> /* IOMMU model definition for iommu operations from userspace */
> enum iommu_model {
> 	INTLE_IOMMU,
> 	ARM_SMMU,
> 	AMD_IOMMU,
> 	SPAPR_IOMMU,
> 	S390_IOMMU,
> };
> 
> struct iommu_tlb_invalidate {
> 	__u32	scope;
> /* pasid-selective invalidation described by @pasid */
> #define IOMMU_INVALIDATE_PASID	(1 << 0)
> /* address-selevtive invalidation described by (@vaddr, @size) */
> #define IOMMU_INVALIDATE_VADDR	(1 << 1)

Again, is a bit field appropriate here, can a user set both bits?

> 	__u32	flags;
> /*  targets non-pasid mappings, @pasid is not valid */
> #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> /* indicating that the pIOMMU doesn't need to invalidate
> 	all intermediate tables cached as part of the PTE for
> 	vaddr, only the last-level entry (pte). This is a hint. */
> #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)

Are we venturing into vendor specific attributes here?

> 	__u32	pasid;
> 	__u64	vaddr;
> 	__u64	size;
> 	enum iommu_model model;

How does a user learn which model(s) are supported by the interface?
How do they learn which ops are supported?  Perhaps a good use for one
of those flag bits in the outer data structure is "probe".

> 	/*
> 	 Vendor may have different HW version and thus the
> 	 data part of this structure differs, use sub_version
> 	 to indicate such difference.
> 	 */
> 	__u322 sub_version;

Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in
the model enum.

> 	__u64 length; // length of the data[] part in byte

Questionably useful vs calculating from argsz again , but it certainly
doesn't need to be a qword :-o

> 	__u8	data[];
> };
> 
> For Intel, the data structue is:
> struct intel_iommu_invalidate_data {
> 	__u64 low;
> 	__u64 high;
> }

high/low what?  This is a pretty weak uapi definition.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-14 18:15                                   ` Alex Williamson
@ 2017-07-17 10:58                                       ` Liu, Yi L
  -1 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-17 10:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

Hi Alex,

Pls refer to the response inline.

> -----Original Message-----
> From: kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf
> Of Alex Williamson
> Sent: Saturday, July 15, 2017 2:16 AM
> To: Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>; Tian, Kevin
> <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>; Lan, Tianyu
> <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Raj, Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>; peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Pan, Jacob jun
> <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Fri, 14 Jul 2017 08:58:02 +0000
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > Hi Alex,
> >
> > Against to the opaque open, I'd like to propose the following
> > definition based on the existing comments. Pls note that I've merged
> > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > and make different flags to indicate the iommu operations. Per Kevin's
> > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > renamed the IOCTL and data structure to be non-svm specific. Pls
> > kindly have a review, so that we can make the opaque open closed and
> > move forward. Surely, comments and ideas are welcomed. And for the
> > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to
> give your ideas on it.
> >
> > 1. Add a VFIO IOCTL for iommu operations from user-space
> >
> > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> >
> > Corresponding data structure:
> > struct vfio_iommu_operation_info {
> > 	__u32	argsz;
> > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace
> driver*/
> > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> > 	__u32	flag;
> > 	__u32	length; // length of the data[] part in byte
> > 	__u8	data[]; // stores the data for iommu op indicated by flag field
> > };
> 
> If we're doing a generic "Ops" ioctl, then we should have an "op" field which is
> defined by an enum.  It doesn't make sense to use flags for this, for example can we
> set multiple flag bits?  If not then it's not a good use for a bit field.  I'm also not sure I
> understand the value of the "length" field, can't it always be calculated from argsz?

Agreed, enum would be better. "length" field could be calculated from argsz. I used
it just to avoid offset calculations. May remove it.
 
> > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > data which would be parsed by the "struct iommu_tlb_invalidate"
> > defined below.
> >
> > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> >
> > /* IOMMU model definition for iommu operations from userspace */ enum
> > iommu_model {
> > 	INTLE_IOMMU,
> > 	ARM_SMMU,
> > 	AMD_IOMMU,
> > 	SPAPR_IOMMU,
> > 	S390_IOMMU,
> > };
> >
> > struct iommu_tlb_invalidate {
> > 	__u32	scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> 
> Again, is a bit field appropriate here, can a user set both bits?

yes, user may set both bits. It would be invalidate address range
which is tagged with a PASID value.

> 
> > 	__u32	flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > /* indicating that the pIOMMU doesn't need to invalidate
> > 	all intermediate tables cached as part of the PTE for
> > 	vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> 
> Are we venturing into vendor specific attributes here?

These two attributes are still in discussion. Jean and me synced
several rounds. But lack of comments from other vendors.

Personally, I think both should be generic.
IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
only invalidate leaf mappings. 
I would see if other vendor is object on it. If yes, I'm fine to move
it to vendor specific part.
 
> 
> > 	__u32	pasid;
> > 	__u64	vaddr;
> > 	__u64	size;
> > 	enum iommu_model model;
> 
> How does a user learn which model(s) are supported by the interface?
> How do they learn which ops are supported?  Perhaps a good use for one of those
> flag bits in the outer data structure is "probe".

My initial plan to user fills it, if the underlying HW doesn't support the
model, it refuses to service it. User should get a failure and stop to use
it. But your suggestion to have a probe or kinds of query makes sense.
How about we add one more operation for such purpose? Besides the
supported model query, I'd like to add more. E.g the HW IOMMU capabilities.

> 
> > 	/*
> > 	 Vendor may have different HW version and thus the
> > 	 data part of this structure differs, use sub_version
> > 	 to indicate such difference.
> > 	 */
> > 	__u322 sub_version;
> 
> Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model
> enum.

Both are fine to me. Just see the opinions from other guys.

> > 	__u64 length; // length of the data[] part in byte
> 
> Questionably useful vs calculating from argsz again , but it certainly doesn't need to
> be a qword :-o

Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However,
I would like to leave it here. Reason is:
argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver
layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate"
to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer
to pass it in the structure. If it's better to pass it as a parameter, I would do it.

> 
> > 	__u8	data[];
> > };
> >
> > For Intel, the data structue is:
> > struct intel_iommu_invalidate_data {
> > 	__u64 low;
> > 	__u64 high;
> > }
> 
> high/low what?  This is a pretty weak uapi definition.  Thanks,

For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation.
The structure varies from invalidation type to type. Here is my thought on it. Define
an 128 bits union. List the invalidation data details for each invalidation type. What's
your opinion on it? So far, we have 7 types for invalidation. The prq response is not
included.

union intel_iommu_invalidate_data {
 	struct {
		__u64 low;
 		__u64 high;
	} invalidate_data;

	struct {
		__u64 type: 4;
		__u64 gran: 2;
		__u64 rsv1: 10;
		__u64 did: 16;
		__u64 sid: 16;
		__u64 func_mask: 2;
		__u64 rsv2: 14;
		__64 rsv3: 64;
	} context_cache_inv;
	....
};

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-17 10:58                                       ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-17 10:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jean-Philippe Brucker, Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj,
	Ashok, kvm, jasowang, Will Deacon, peterx, qemu-devel, iommu,
	Pan, Jacob jun, Joerg Roedel

Hi Alex,

Pls refer to the response inline.

> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Alex Williamson
> Sent: Saturday, July 15, 2017 2:16 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin
> <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu
> <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org;
> jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com;
> qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun
> <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org>
> Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> invalidate propagation
> 
> On Fri, 14 Jul 2017 08:58:02 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > Hi Alex,
> >
> > Against to the opaque open, I'd like to propose the following
> > definition based on the existing comments. Pls note that I've merged
> > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > and make different flags to indicate the iommu operations. Per Kevin's
> > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > renamed the IOCTL and data structure to be non-svm specific. Pls
> > kindly have a review, so that we can make the opaque open closed and
> > move forward. Surely, comments and ideas are welcomed. And for the
> > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to
> give your ideas on it.
> >
> > 1. Add a VFIO IOCTL for iommu operations from user-space
> >
> > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> >
> > Corresponding data structure:
> > struct vfio_iommu_operation_info {
> > 	__u32	argsz;
> > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace
> driver*/
> > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> > 	__u32	flag;
> > 	__u32	length; // length of the data[] part in byte
> > 	__u8	data[]; // stores the data for iommu op indicated by flag field
> > };
> 
> If we're doing a generic "Ops" ioctl, then we should have an "op" field which is
> defined by an enum.  It doesn't make sense to use flags for this, for example can we
> set multiple flag bits?  If not then it's not a good use for a bit field.  I'm also not sure I
> understand the value of the "length" field, can't it always be calculated from argsz?

Agreed, enum would be better. "length" field could be calculated from argsz. I used
it just to avoid offset calculations. May remove it.
 
> > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > data which would be parsed by the "struct iommu_tlb_invalidate"
> > defined below.
> >
> > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> >
> > /* IOMMU model definition for iommu operations from userspace */ enum
> > iommu_model {
> > 	INTLE_IOMMU,
> > 	ARM_SMMU,
> > 	AMD_IOMMU,
> > 	SPAPR_IOMMU,
> > 	S390_IOMMU,
> > };
> >
> > struct iommu_tlb_invalidate {
> > 	__u32	scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> 
> Again, is a bit field appropriate here, can a user set both bits?

yes, user may set both bits. It would be invalidate address range
which is tagged with a PASID value.

> 
> > 	__u32	flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > /* indicating that the pIOMMU doesn't need to invalidate
> > 	all intermediate tables cached as part of the PTE for
> > 	vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> 
> Are we venturing into vendor specific attributes here?

These two attributes are still in discussion. Jean and me synced
several rounds. But lack of comments from other vendors.

Personally, I think both should be generic.
IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
only invalidate leaf mappings. 
I would see if other vendor is object on it. If yes, I'm fine to move
it to vendor specific part.
 
> 
> > 	__u32	pasid;
> > 	__u64	vaddr;
> > 	__u64	size;
> > 	enum iommu_model model;
> 
> How does a user learn which model(s) are supported by the interface?
> How do they learn which ops are supported?  Perhaps a good use for one of those
> flag bits in the outer data structure is "probe".

My initial plan to user fills it, if the underlying HW doesn't support the
model, it refuses to service it. User should get a failure and stop to use
it. But your suggestion to have a probe or kinds of query makes sense.
How about we add one more operation for such purpose? Besides the
supported model query, I'd like to add more. E.g the HW IOMMU capabilities.

> 
> > 	/*
> > 	 Vendor may have different HW version and thus the
> > 	 data part of this structure differs, use sub_version
> > 	 to indicate such difference.
> > 	 */
> > 	__u322 sub_version;
> 
> Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model
> enum.

Both are fine to me. Just see the opinions from other guys.

> > 	__u64 length; // length of the data[] part in byte
> 
> Questionably useful vs calculating from argsz again , but it certainly doesn't need to
> be a qword :-o

Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However,
I would like to leave it here. Reason is:
argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver
layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate"
to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer
to pass it in the structure. If it's better to pass it as a parameter, I would do it.

> 
> > 	__u8	data[];
> > };
> >
> > For Intel, the data structue is:
> > struct intel_iommu_invalidate_data {
> > 	__u64 low;
> > 	__u64 high;
> > }
> 
> high/low what?  This is a pretty weak uapi definition.  Thanks,

For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation.
The structure varies from invalidation type to type. Here is my thought on it. Define
an 128 bits union. List the invalidation data details for each invalidation type. What's
your opinion on it? So far, we have 7 types for invalidation. The prq response is not
included.

union intel_iommu_invalidate_data {
 	struct {
		__u64 low;
 		__u64 high;
	} invalidate_data;

	struct {
		__u64 type: 4;
		__u64 gran: 2;
		__u64 rsv1: 10;
		__u64 did: 16;
		__u64 sid: 16;
		__u64 func_mask: 2;
		__u64 rsv2: 14;
		__64 rsv3: 64;
	} context_cache_inv;
	....
};

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-17 10:58                                       ` Liu, Yi L
  (?)
@ 2017-07-17 22:45                                       ` Alex Williamson
       [not found]                                         ` <20170717164515.2491b3bf-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
  -1 siblings, 1 reply; 116+ messages in thread
From: Alex Williamson @ 2017-07-17 22:45 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jean-Philippe Brucker, Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj,
	Ashok, kvm, jasowang, Will Deacon, peterx, qemu-devel, iommu,
	Pan, Jacob jun, Joerg Roedel

On Mon, 17 Jul 2017 10:58:41 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> Pls refer to the response inline.
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> > Of Alex Williamson
> > Sent: Saturday, July 15, 2017 2:16 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin
> > <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu
> > <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org;
> > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com;
> > qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun
> > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org>
> > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> > invalidate propagation
> > 
> > On Fri, 14 Jul 2017 08:58:02 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > Hi Alex,
> > >
> > > Against to the opaque open, I'd like to propose the following
> > > definition based on the existing comments. Pls note that I've merged
> > > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > > and make different flags to indicate the iommu operations. Per Kevin's
> > > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > > renamed the IOCTL and data structure to be non-svm specific. Pls
> > > kindly have a review, so that we can make the opaque open closed and
> > > move forward. Surely, comments and ideas are welcomed. And for the
> > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to  
> > give your ideas on it.  
> > >
> > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > >
> > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > >
> > > Corresponding data structure:
> > > struct vfio_iommu_operation_info {
> > > 	__u32	argsz;
> > > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > > #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace  
> > driver*/  
> > > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > > #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> > > 	__u32	flag;
> > > 	__u32	length; // length of the data[] part in byte
> > > 	__u8	data[]; // stores the data for iommu op indicated by flag field
> > > };  
> > 
> > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is
> > defined by an enum.  It doesn't make sense to use flags for this, for example can we
> > set multiple flag bits?  If not then it's not a good use for a bit field.  I'm also not sure I
> > understand the value of the "length" field, can't it always be calculated from argsz?  
> 
> Agreed, enum would be better. "length" field could be calculated from argsz. I used
> it just to avoid offset calculations. May remove it.
>  
> > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > > data which would be parsed by the "struct iommu_tlb_invalidate"
> > > defined below.
> > >
> > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> > >
> > > /* IOMMU model definition for iommu operations from userspace */ enum
> > > iommu_model {
> > > 	INTLE_IOMMU,
> > > 	ARM_SMMU,
> > > 	AMD_IOMMU,
> > > 	SPAPR_IOMMU,
> > > 	S390_IOMMU,
> > > };
> > >
> > > struct iommu_tlb_invalidate {
> > > 	__u32	scope;
> > > /* pasid-selective invalidation described by @pasid */
> > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > #define IOMMU_INVALIDATE_VADDR	(1 << 1)  
> > 
> > Again, is a bit field appropriate here, can a user set both bits?  
> 
> yes, user may set both bits. It would be invalidate address range
> which is tagged with a PASID value.
> 
> >   
> > > 	__u32	flags;
> > > /*  targets non-pasid mappings, @pasid is not valid */
> > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > > /* indicating that the pIOMMU doesn't need to invalidate
> > > 	all intermediate tables cached as part of the PTE for
> > > 	vaddr, only the last-level entry (pte). This is a hint. */
> > > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)  
> > 
> > Are we venturing into vendor specific attributes here?  
> 
> These two attributes are still in discussion. Jean and me synced
> several rounds. But lack of comments from other vendors.
> 
> Personally, I think both should be generic.
> IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
> for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
> only invalidate leaf mappings. 
> I would see if other vendor is object on it. If yes, I'm fine to move
> it to vendor specific part.
>  
> >   
> > > 	__u32	pasid;
> > > 	__u64	vaddr;
> > > 	__u64	size;
> > > 	enum iommu_model model;  
> > 
> > How does a user learn which model(s) are supported by the interface?
> > How do they learn which ops are supported?  Perhaps a good use for one of those
> > flag bits in the outer data structure is "probe".  
> 
> My initial plan to user fills it, if the underlying HW doesn't support the
> model, it refuses to service it. User should get a failure and stop to use
> it. But your suggestion to have a probe or kinds of query makes sense.
> How about we add one more operation for such purpose? Besides the
> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.

We also have VFIO_IOMMU_GET_INFO where the structure can be extended
for missing capabilities.  Depending on the capability you want to
describe, this might be a better, existing interface for it.
 
> > > 	/*
> > > 	 Vendor may have different HW version and thus the
> > > 	 data part of this structure differs, use sub_version
> > > 	 to indicate such difference.
> > > 	 */
> > > 	__u322 sub_version;  
> > 
> > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model
> > enum.  
> 
> Both are fine to me. Just see the opinions from other guys.
> 
> > > 	__u64 length; // length of the data[] part in byte  
> > 
> > Questionably useful vs calculating from argsz again , but it certainly doesn't need to
> > be a qword :-o  
> 
> Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However,
> I would like to leave it here. Reason is:
> argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver
> layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate"
> to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer
> to pass it in the structure. If it's better to pass it as a parameter, I would do it.

Ok, then the layer that does the copy_from_user will need to validate
that length is fully contained within the copied data structure, we
can't let the user trick the kernel into using kernel memory for this.

> > > 	__u8	data[];
> > > };
> > >
> > > For Intel, the data structue is:
> > > struct intel_iommu_invalidate_data {
> > > 	__u64 low;
> > > 	__u64 high;
> > > }  
> > 
> > high/low what?  This is a pretty weak uapi definition.  Thanks,  
> 
> For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation.
> The structure varies from invalidation type to type. Here is my thought on it. Define
> an 128 bits union. List the invalidation data details for each invalidation type. What's
> your opinion on it? So far, we have 7 types for invalidation. The prq response is not
> included.

I want this interface to be fully defined, but at the same time I don't
necessarily want to create useless data structures.  I believe the
intention here is to pass these directly through to a QI entry, where
we must match a hardware definition.  I'm tempted to suggest
referencing the hardware specification, but see below...

A concern for this model is that hardware may trust the iommu driver
not to create QI entries that don't set reserved bits or set invalid
field data.  If it does those kinds of things, it's a kernel driver
bug.  Once exposed to the user, we cannot guarantee that.  Does Intel
have confidence that a user cannot maliciously interfere with other
contexts or the general operation of the invalidation queue if a user is
effectively given direct access?  Will the invalidation data be
sanitized by the iommu driver?
 
> union intel_iommu_invalidate_data {
>  	struct {
> 		__u64 low;
>  		__u64 high;
> 	} invalidate_data;
> 
> 	struct {
> 		__u64 type: 4;
> 		__u64 gran: 2;
> 		__u64 rsv1: 10;
> 		__u64 did: 16;
> 		__u64 sid: 16;
> 		__u64 func_mask: 2;
> 		__u64 rsv2: 14;
> 		__64 rsv3: 64;
> 	} context_cache_inv;
> 	....

Here's part of the issue with not fully defining these, we have did,
sid, and func_mask.  I think we're claiming that the benefit of passing
through the hardware data structure is performance, but the user needs
to replace these IDs to match the physical device rather than the
virtual device, perhaps even entirely recreating it because there's not
necessarily a 1:1 mapping of things like func_mask between virtual and
physical hardware topologies (assuming I'm interpreting these fields
correctly).  Doesn't the kernel also need to validate any such field to
prevent the user spoofing entries for other devices?  Is there any
actual performance benefit remaining vs defining a generic interface
after multiple levels have manipulated, recreated, and sanitized
these structures?  We can't evaluate these sorts of risks if we don't
define what we're passing through.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-17 22:45                                       ` Alex Williamson
@ 2017-07-18  9:38                                             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-18  9:38 UTC (permalink / raw)
  To: Alex Williamson, Liu, Yi L
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On 17/07/17 23:45, Alex Williamson wrote:
[..]
>>>
>>> How does a user learn which model(s) are supported by the interface?
>>> How do they learn which ops are supported?  Perhaps a good use for one of those
>>> flag bits in the outer data structure is "probe".  
>>
>> My initial plan to user fills it, if the underlying HW doesn't support the
>> model, it refuses to service it. User should get a failure and stop to use
>> it. But your suggestion to have a probe or kinds of query makes sense.
>> How about we add one more operation for such purpose? Besides the
>> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.
> 
> We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> for missing capabilities.  Depending on the capability you want to
> describe, this might be a better, existing interface for it.

It might become hairy when physical IOMMUs start supporting multiple
formats, or when we want to describe multiple page table formats in
addition to PASID tables. I was wondering if sysfs iommu_group would be
better suited for this kind of hardware description with variable-length
properties, but a new ioctl-based probing mechanism would work as well.

Other things that we'll want to describe are fault reporting capability
and PASID range, which would fit better in vfio_device_info.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-18  9:38                                             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-18  9:38 UTC (permalink / raw)
  To: Alex Williamson, Liu, Yi L
  Cc: Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj, Ashok, kvm, jasowang,
	Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun,
	Joerg Roedel

On 17/07/17 23:45, Alex Williamson wrote:
[..]
>>>
>>> How does a user learn which model(s) are supported by the interface?
>>> How do they learn which ops are supported?  Perhaps a good use for one of those
>>> flag bits in the outer data structure is "probe".  
>>
>> My initial plan to user fills it, if the underlying HW doesn't support the
>> model, it refuses to service it. User should get a failure and stop to use
>> it. But your suggestion to have a probe or kinds of query makes sense.
>> How about we add one more operation for such purpose? Besides the
>> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.
> 
> We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> for missing capabilities.  Depending on the capability you want to
> describe, this might be a better, existing interface for it.

It might become hairy when physical IOMMUs start supporting multiple
formats, or when we want to describe multiple page table formats in
addition to PASID tables. I was wondering if sysfs iommu_group would be
better suited for this kind of hardware description with variable-length
properties, but a new ioctl-based probing mechanism would work as well.

Other things that we'll want to describe are fault reporting capability
and PASID range, which would fit better in vfio_device_info.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-18  9:38                                             ` Jean-Philippe Brucker
@ 2017-07-18 14:29                                                 ` Alex Williamson
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-18 14:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On Tue, 18 Jul 2017 10:38:40 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> On 17/07/17 23:45, Alex Williamson wrote:
> [..]
> >>>
> >>> How does a user learn which model(s) are supported by the interface?
> >>> How do they learn which ops are supported?  Perhaps a good use for one of those
> >>> flag bits in the outer data structure is "probe".    
> >>
> >> My initial plan to user fills it, if the underlying HW doesn't support the
> >> model, it refuses to service it. User should get a failure and stop to use
> >> it. But your suggestion to have a probe or kinds of query makes sense.
> >> How about we add one more operation for such purpose? Besides the
> >> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.  
> > 
> > We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> > for missing capabilities.  Depending on the capability you want to
> > describe, this might be a better, existing interface for it.  
> 
> It might become hairy when physical IOMMUs start supporting multiple
> formats, or when we want to describe multiple page table formats in
> addition to PASID tables. I was wondering if sysfs iommu_group would be
> better suited for this kind of hardware description with variable-length
> properties, but a new ioctl-based probing mechanism would work as well.

Would different groups have different properties?  Perhaps it's related
to the iommu hardware unit supporting the device, which could host one
or more groups.  Each device already has a link to its iommu where we
could add info (/sys/class/iommu/).
 
> Other things that we'll want to describe are fault reporting capability
> and PASID range, which would fit better in vfio_device_info.

Why?  The per device PASID info is in a PCI capability, so wouldn't
this be iommu info?  Isn't the fault reporting also via the iommu?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-18 14:29                                                 ` Alex Williamson
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Williamson @ 2017-07-18 14:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj, Ashok, kvm,
	jasowang, Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun,
	Joerg Roedel

On Tue, 18 Jul 2017 10:38:40 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 17/07/17 23:45, Alex Williamson wrote:
> [..]
> >>>
> >>> How does a user learn which model(s) are supported by the interface?
> >>> How do they learn which ops are supported?  Perhaps a good use for one of those
> >>> flag bits in the outer data structure is "probe".    
> >>
> >> My initial plan to user fills it, if the underlying HW doesn't support the
> >> model, it refuses to service it. User should get a failure and stop to use
> >> it. But your suggestion to have a probe or kinds of query makes sense.
> >> How about we add one more operation for such purpose? Besides the
> >> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.  
> > 
> > We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> > for missing capabilities.  Depending on the capability you want to
> > describe, this might be a better, existing interface for it.  
> 
> It might become hairy when physical IOMMUs start supporting multiple
> formats, or when we want to describe multiple page table formats in
> addition to PASID tables. I was wondering if sysfs iommu_group would be
> better suited for this kind of hardware description with variable-length
> properties, but a new ioctl-based probing mechanism would work as well.

Would different groups have different properties?  Perhaps it's related
to the iommu hardware unit supporting the device, which could host one
or more groups.  Each device already has a link to its iommu where we
could add info (/sys/class/iommu/).
 
> Other things that we'll want to describe are fault reporting capability
> and PASID range, which would fit better in vfio_device_info.

Why?  The per device PASID info is in a PCI capability, so wouldn't
this be iommu info?  Isn't the fault reporting also via the iommu?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-18 14:29                                                 ` Alex Williamson
  (?)
@ 2017-07-18 15:03                                                 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 116+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-18 15:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, Liu, Yi L, Lan, Tianyu, Raj, Ashok, kvm,
	jasowang, Will Deacon, peterx, qemu-devel, iommu, Pan, Jacob jun,
	Joerg Roedel

On 18/07/17 15:29, Alex Williamson wrote:
> On Tue, 18 Jul 2017 10:38:40 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 17/07/17 23:45, Alex Williamson wrote:
>> [..]
>>>>>
>>>>> How does a user learn which model(s) are supported by the interface?
>>>>> How do they learn which ops are supported?  Perhaps a good use for one of those
>>>>> flag bits in the outer data structure is "probe".    
>>>>
>>>> My initial plan to user fills it, if the underlying HW doesn't support the
>>>> model, it refuses to service it. User should get a failure and stop to use
>>>> it. But your suggestion to have a probe or kinds of query makes sense.
>>>> How about we add one more operation for such purpose? Besides the
>>>> supported model query, I'd like to add more. E.g the HW IOMMU capabilities.  
>>>
>>> We also have VFIO_IOMMU_GET_INFO where the structure can be extended
>>> for missing capabilities.  Depending on the capability you want to
>>> describe, this might be a better, existing interface for it.  
>>
>> It might become hairy when physical IOMMUs start supporting multiple
>> formats, or when we want to describe multiple page table formats in
>> addition to PASID tables. I was wondering if sysfs iommu_group would be
>> better suited for this kind of hardware description with variable-length
>> properties, but a new ioctl-based probing mechanism would work as well.
> 
> Would different groups have different properties?  Perhaps it's related
> to the iommu hardware unit supporting the device, which could host one
> or more groups.  Each device already has a link to its iommu where we
> could add info (/sys/class/iommu/).

Yes, /sys/class/iommu might be better than iommu_group for PASID and page
table formats.

>> Other things that we'll want to describe are fault reporting capability
>> and PASID range, which would fit better in vfio_device_info.
> 
> Why?  The per device PASID info is in a PCI capability, so wouldn't
> this be iommu info?  Isn't the fault reporting also via the iommu?

Ah yes, I missed that. If userspace can read the PASID and PRI
capabilities it should be enough.

Inspecting individual device capability might help userspace decide how to
combine multiple devices in a container. For example, if it puts
PASID-capable and non-PASID-capable device in the same container, the
container probably wouldn't support PASID (but would still support MAP/UNMAP).

I'm not sure how it will work for platform devices though, some integrated
devices on ARM will support features resembling PASID and PRI. I suspect
these will need ACPI/DT description anyway, that userspace would access
via sysfs.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-17 22:45                                       ` Alex Williamson
@ 2017-07-19 10:45                                             ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-19 10:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lan, Tianyu, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> On Mon, 17 Jul 2017 10:58:41 +0000
> "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > Hi Alex,
> > 
> > Pls refer to the response inline.
> > 
> > > -----Original Message-----
> > > From: kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf
> > > Of Alex Williamson
> > > Sent: Saturday, July 15, 2017 2:16 AM
> > > To: Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > Cc: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>; Tian, Kevin
> > > <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>; Lan, Tianyu
> > > <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Raj, Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > > jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>; peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > > qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Pan, Jacob jun
> > > <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> > > invalidate propagation
> > > 
> > > On Fri, 14 Jul 2017 08:58:02 +0000
> > > "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > >   
> > > > Hi Alex,
> > > >
> > > > Against to the opaque open, I'd like to propose the following
> > > > definition based on the existing comments. Pls note that I've merged
> > > > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > > > and make different flags to indicate the iommu operations. Per Kevin's
> > > > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > > > renamed the IOCTL and data structure to be non-svm specific. Pls
> > > > kindly have a review, so that we can make the opaque open closed and
> > > > move forward. Surely, comments and ideas are welcomed. And for the
> > > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to  
> > > give your ideas on it.  
> > > >
> > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > >
> > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > >
> > > > Corresponding data structure:
> > > > struct vfio_iommu_operation_info {
> > > > 	__u32	argsz;
> > > > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > > > #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace  
> > > driver*/  
> > > > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > > > #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> > > > 	__u32	flag;
> > > > 	__u32	length; // length of the data[] part in byte
> > > > 	__u8	data[]; // stores the data for iommu op indicated by flag field
> > > > };  
> > > 
> > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is
> > > defined by an enum.  It doesn't make sense to use flags for this, for example can we
> > > set multiple flag bits?  If not then it's not a good use for a bit field.  I'm also not sure I
> > > understand the value of the "length" field, can't it always be calculated from argsz?  
> > 
> > Agreed, enum would be better. "length" field could be calculated from argsz. I used
> > it just to avoid offset calculations. May remove it.
> >  
> > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > > > data which would be parsed by the "struct iommu_tlb_invalidate"
> > > > defined below.
> > > >
> > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> > > >
> > > > /* IOMMU model definition for iommu operations from userspace */ enum
> > > > iommu_model {
> > > > 	INTLE_IOMMU,
> > > > 	ARM_SMMU,
> > > > 	AMD_IOMMU,
> > > > 	SPAPR_IOMMU,
> > > > 	S390_IOMMU,
> > > > };
> > > >
> > > > struct iommu_tlb_invalidate {
> > > > 	__u32	scope;
> > > > /* pasid-selective invalidation described by @pasid */
> > > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > > #define IOMMU_INVALIDATE_VADDR	(1 << 1)  
> > > 
> > > Again, is a bit field appropriate here, can a user set both bits?  
> > 
> > yes, user may set both bits. It would be invalidate address range
> > which is tagged with a PASID value.
> > 
> > >   
> > > > 	__u32	flags;
> > > > /*  targets non-pasid mappings, @pasid is not valid */
> > > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > > > /* indicating that the pIOMMU doesn't need to invalidate
> > > > 	all intermediate tables cached as part of the PTE for
> > > > 	vaddr, only the last-level entry (pte). This is a hint. */
> > > > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)  
> > > 
> > > Are we venturing into vendor specific attributes here?  
> > 
> > These two attributes are still in discussion. Jean and me synced
> > several rounds. But lack of comments from other vendors.
> > 
> > Personally, I think both should be generic.
> > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
> > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
> > only invalidate leaf mappings. 
> > I would see if other vendor is object on it. If yes, I'm fine to move
> > it to vendor specific part.
> >  
> > >   
> > > > 	__u32	pasid;
> > > > 	__u64	vaddr;
> > > > 	__u64	size;
> > > > 	enum iommu_model model;  
> > > 
> > > How does a user learn which model(s) are supported by the interface?
> > > How do they learn which ops are supported?  Perhaps a good use for one of those
> > > flag bits in the outer data structure is "probe".  
> > 
> > My initial plan to user fills it, if the underlying HW doesn't support the
> > model, it refuses to service it. User should get a failure and stop to use
> > it. But your suggestion to have a probe or kinds of query makes sense.
> > How about we add one more operation for such purpose? Besides the
> > supported model query, I'd like to add more. E.g the HW IOMMU capabilities.
> 
> We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> for missing capabilities.  Depending on the capability you want to
> describe, this might be a better, existing interface for it.
>  
> > > > 	/*
> > > > 	 Vendor may have different HW version and thus the
> > > > 	 data part of this structure differs, use sub_version
> > > > 	 to indicate such difference.
> > > > 	 */
> > > > 	__u322 sub_version;  
> > > 
> > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model
> > > enum.  
> > 
> > Both are fine to me. Just see the opinions from other guys.
> > 
> > > > 	__u64 length; // length of the data[] part in byte  
> > > 
> > > Questionably useful vs calculating from argsz again , but it certainly doesn't need to
> > > be a qword :-o  
> > 
> > Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However,
> > I would like to leave it here. Reason is:
> > argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver
> > layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate"
> > to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer
> > to pass it in the structure. If it's better to pass it as a parameter, I would do it.
> 
> Ok, then the layer that does the copy_from_user will need to validate
> that length is fully contained within the copied data structure, we
> can't let the user trick the kernel into using kernel memory for this.

VFIO is still the layer which copy_from_user. would check the length.

> 
> > > > 	__u8	data[];
> > > > };
> > > >
> > > > For Intel, the data structue is:
> > > > struct intel_iommu_invalidate_data {
> > > > 	__u64 low;
> > > > 	__u64 high;
> > > > }  
> > > 
> > > high/low what?  This is a pretty weak uapi definition.  Thanks,  
> > 
> > For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation.
> > The structure varies from invalidation type to type. Here is my thought on it. Define
> > an 128 bits union. List the invalidation data details for each invalidation type. What's
> > your opinion on it? So far, we have 7 types for invalidation. The prq response is not
> > included.
> 
> I want this interface to be fully defined, but at the same time I don't
> necessarily want to create useless data structures.  I believe the
> intention here is to pass these directly through to a QI entry, where

yes, it's a QI entry from guest.

> we must match a hardware definition.  I'm tempted to suggest
> referencing the hardware specification, but see below...
> 
> A concern for this model is that hardware may trust the iommu driver
> not to create QI entries that don't set reserved bits or set invalid
> field data.  If it does those kinds of things, it's a kernel driver
> bug.  Once exposed to the user, we cannot guarantee that.  Does Intel
> have confidence that a user cannot maliciously interfere with other
> contexts or the general operation of the invalidation queue if a user is
> effectively given direct access?  Will the invalidation data be
> sanitized by the iommu driver?
>  
> > union intel_iommu_invalidate_data {
> >  	struct {
> > 		__u64 low;
> >  		__u64 high;
> > 	} invalidate_data;
> > 
> > 	struct {
> > 		__u64 type: 4;
> > 		__u64 gran: 2;
> > 		__u64 rsv1: 10;
> > 		__u64 did: 16;
> > 		__u64 sid: 16;
> > 		__u64 func_mask: 2;
> > 		__u64 rsv2: 14;
> > 		__64 rsv3: 64;
> > 	} context_cache_inv;
> > 	....
> 
> Here's part of the issue with not fully defining these, we have did,
> sid, and func_mask.  I think we're claiming that the benefit of passing
> through the hardware data structure is performance, but the user needs
> to replace these IDs to match the physical device rather than the
> virtual device, perhaps even entirely recreating it because there's not
> necessarily a 1:1 mapping of things like func_mask between virtual and
> physical hardware topologies (assuming I'm interpreting these fields
> correctly).  Doesn't the kernel also need to validate any such field to
> prevent the user spoofing entries for other devices?  Is there any
> actual performance benefit remaining vs defining a generic interface
> after multiple levels have manipulated, recreated, and sanitized
> these structures?  We can't evaluate these sorts of risks if we don't
> define what we're passing through.  Thanks,
> 

A potential proposal is to abstract the fields of the QI entry. However,
here is a concern for it. Different type of QI entry would have diferent
fields. It means we need to have a hyper set to include all the possible
fields. Supposedly, the set would increase as more QI type is introduced.
I'm not sure if it is an acceptable definition.

Based on the latest spec, the vendor-specific fields may have:

Global hint
Drain read/write
Source-ID
MIP
PFSID

PRQ response is another topic. Not included here.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-19 10:45                                             ` Liu, Yi L
  0 siblings, 0 replies; 116+ messages in thread
From: Liu, Yi L @ 2017-07-19 10:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Lan, Tianyu, Tian, Kevin, Raj, Ashok, kvm,
	Jean-Philippe Brucker, jasowang, Will Deacon, qemu-devel, peterx,
	iommu, Pan, Jacob jun, Joerg Roedel

On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> On Mon, 17 Jul 2017 10:58:41 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > Hi Alex,
> > 
> > Pls refer to the response inline.
> > 
> > > -----Original Message-----
> > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> > > Of Alex Williamson
> > > Sent: Saturday, July 15, 2017 2:16 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin
> > > <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu
> > > <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org;
> > > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com;
> > > qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun
> > > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org>
> > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB
> > > invalidate propagation
> > > 
> > > On Fri, 14 Jul 2017 08:58:02 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >   
> > > > Hi Alex,
> > > >
> > > > Against to the opaque open, I'd like to propose the following
> > > > definition based on the existing comments. Pls note that I've merged
> > > > the pasid table binding and iommu tlb invalidation into a single IOCTL
> > > > and make different flags to indicate the iommu operations. Per Kevin's
> > > > comments, there may be iommu invalidation for guest IOVA tlb, so I
> > > > renamed the IOCTL and data structure to be non-svm specific. Pls
> > > > kindly have a review, so that we can make the opaque open closed and
> > > > move forward. Surely, comments and ideas are welcomed. And for the
> > > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to  
> > > give your ideas on it.  
> > > >
> > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > >
> > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > >
> > > > Corresponding data structure:
> > > > struct vfio_iommu_operation_info {
> > > > 	__u32	argsz;
> > > > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
> > > > #define VFIO_IOMMU_BIND_PASID	(1 << 1) /* Bind PASID from userspace  
> > > driver*/  
> > > > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
> > > > #define VFIO_IOMMU_INVAL_IOTLB	(1 << 3) /* Invalidate iommu tlb */
> > > > 	__u32	flag;
> > > > 	__u32	length; // length of the data[] part in byte
> > > > 	__u8	data[]; // stores the data for iommu op indicated by flag field
> > > > };  
> > > 
> > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is
> > > defined by an enum.  It doesn't make sense to use flags for this, for example can we
> > > set multiple flag bits?  If not then it's not a good use for a bit field.  I'm also not sure I
> > > understand the value of the "length" field, can't it always be calculated from argsz?  
> > 
> > Agreed, enum would be better. "length" field could be calculated from argsz. I used
> > it just to avoid offset calculations. May remove it.
> >  
> > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores
> > > > data which would be parsed by the "struct iommu_tlb_invalidate"
> > > > defined below.
> > > >
> > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file)
> > > >
> > > > /* IOMMU model definition for iommu operations from userspace */ enum
> > > > iommu_model {
> > > > 	INTLE_IOMMU,
> > > > 	ARM_SMMU,
> > > > 	AMD_IOMMU,
> > > > 	SPAPR_IOMMU,
> > > > 	S390_IOMMU,
> > > > };
> > > >
> > > > struct iommu_tlb_invalidate {
> > > > 	__u32	scope;
> > > > /* pasid-selective invalidation described by @pasid */
> > > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > > #define IOMMU_INVALIDATE_VADDR	(1 << 1)  
> > > 
> > > Again, is a bit field appropriate here, can a user set both bits?  
> > 
> > yes, user may set both bits. It would be invalidate address range
> > which is tagged with a PASID value.
> > 
> > >   
> > > > 	__u32	flags;
> > > > /*  targets non-pasid mappings, @pasid is not valid */
> > > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > > > /* indicating that the pIOMMU doesn't need to invalidate
> > > > 	all intermediate tables cached as part of the PTE for
> > > > 	vaddr, only the last-level entry (pte). This is a hint. */
> > > > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)  
> > > 
> > > Are we venturing into vendor specific attributes here?  
> > 
> > These two attributes are still in discussion. Jean and me synced
> > several rounds. But lack of comments from other vendors.
> > 
> > Personally, I think both should be generic.
> > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
> > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
> > only invalidate leaf mappings. 
> > I would see if other vendor is object on it. If yes, I'm fine to move
> > it to vendor specific part.
> >  
> > >   
> > > > 	__u32	pasid;
> > > > 	__u64	vaddr;
> > > > 	__u64	size;
> > > > 	enum iommu_model model;  
> > > 
> > > How does a user learn which model(s) are supported by the interface?
> > > How do they learn which ops are supported?  Perhaps a good use for one of those
> > > flag bits in the outer data structure is "probe".  
> > 
> > My initial plan to user fills it, if the underlying HW doesn't support the
> > model, it refuses to service it. User should get a failure and stop to use
> > it. But your suggestion to have a probe or kinds of query makes sense.
> > How about we add one more operation for such purpose? Besides the
> > supported model query, I'd like to add more. E.g the HW IOMMU capabilities.
> 
> We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> for missing capabilities.  Depending on the capability you want to
> describe, this might be a better, existing interface for it.
>  
> > > > 	/*
> > > > 	 Vendor may have different HW version and thus the
> > > > 	 data part of this structure differs, use sub_version
> > > > 	 to indicate such difference.
> > > > 	 */
> > > > 	__u322 sub_version;  
> > > 
> > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model
> > > enum.  
> > 
> > Both are fine to me. Just see the opinions from other guys.
> > 
> > > > 	__u64 length; // length of the data[] part in byte  
> > > 
> > > Questionably useful vs calculating from argsz again , but it certainly doesn't need to
> > > be a qword :-o  
> > 
> > Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However,
> > I would like to leave it here. Reason is:
> > argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver
> > layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate"
> > to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer
> > to pass it in the structure. If it's better to pass it as a parameter, I would do it.
> 
> Ok, then the layer that does the copy_from_user will need to validate
> that length is fully contained within the copied data structure, we
> can't let the user trick the kernel into using kernel memory for this.

VFIO is still the layer which copy_from_user. would check the length.

> 
> > > > 	__u8	data[];
> > > > };
> > > >
> > > > For Intel, the data structue is:
> > > > struct intel_iommu_invalidate_data {
> > > > 	__u64 low;
> > > > 	__u64 high;
> > > > }  
> > > 
> > > high/low what?  This is a pretty weak uapi definition.  Thanks,  
> > 
> > For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation.
> > The structure varies from invalidation type to type. Here is my thought on it. Define
> > an 128 bits union. List the invalidation data details for each invalidation type. What's
> > your opinion on it? So far, we have 7 types for invalidation. The prq response is not
> > included.
> 
> I want this interface to be fully defined, but at the same time I don't
> necessarily want to create useless data structures.  I believe the
> intention here is to pass these directly through to a QI entry, where

yes, it's a QI entry from guest.

> we must match a hardware definition.  I'm tempted to suggest
> referencing the hardware specification, but see below...
> 
> A concern for this model is that hardware may trust the iommu driver
> not to create QI entries that don't set reserved bits or set invalid
> field data.  If it does those kinds of things, it's a kernel driver
> bug.  Once exposed to the user, we cannot guarantee that.  Does Intel
> have confidence that a user cannot maliciously interfere with other
> contexts or the general operation of the invalidation queue if a user is
> effectively given direct access?  Will the invalidation data be
> sanitized by the iommu driver?
>  
> > union intel_iommu_invalidate_data {
> >  	struct {
> > 		__u64 low;
> >  		__u64 high;
> > 	} invalidate_data;
> > 
> > 	struct {
> > 		__u64 type: 4;
> > 		__u64 gran: 2;
> > 		__u64 rsv1: 10;
> > 		__u64 did: 16;
> > 		__u64 sid: 16;
> > 		__u64 func_mask: 2;
> > 		__u64 rsv2: 14;
> > 		__64 rsv3: 64;
> > 	} context_cache_inv;
> > 	....
> 
> Here's part of the issue with not fully defining these, we have did,
> sid, and func_mask.  I think we're claiming that the benefit of passing
> through the hardware data structure is performance, but the user needs
> to replace these IDs to match the physical device rather than the
> virtual device, perhaps even entirely recreating it because there's not
> necessarily a 1:1 mapping of things like func_mask between virtual and
> physical hardware topologies (assuming I'm interpreting these fields
> correctly).  Doesn't the kernel also need to validate any such field to
> prevent the user spoofing entries for other devices?  Is there any
> actual performance benefit remaining vs defining a generic interface
> after multiple levels have manipulated, recreated, and sanitized
> these structures?  We can't evaluate these sorts of risks if we don't
> define what we're passing through.  Thanks,
> 

A potential proposal is to abstract the fields of the QI entry. However,
here is a concern for it. Different type of QI entry would have diferent
fields. It means we need to have a hyper set to include all the possible
fields. Supposedly, the set would increase as more QI type is introduced.
I'm not sure if it is an acceptable definition.

Based on the latest spec, the vendor-specific fields may have:

Global hint
Drain read/write
Source-ID
MIP
PFSID

PRQ response is another topic. Not included here.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
  2017-07-19 10:45                                             ` Liu, Yi L
@ 2017-07-19 21:50                                               ` Jacob Pan
  -1 siblings, 0 replies; 116+ messages in thread
From: Jacob Pan @ 2017-07-19 21:50 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, Will Deacon,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, 19 Jul 2017 18:45:43 +0800
"Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:

> On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> > On Mon, 17 Jul 2017 10:58:41 +0000
> > "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> >   
> > > Hi Alex,
> > > 
> > > Pls refer to the response inline.
> > >   
> > > > -----Original Message-----
> > > > From: kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > [mailto:kvm-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Alex Williamson
> > > > Sent: Saturday, July 15, 2017 2:16 AM
> > > > To: Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > Cc: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>;
> > > > Tian, Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Liu, Yi L
> > > > <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>; Lan, Tianyu <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>;
> > > > Raj, Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > > > jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>;
> > > > peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org;
> > > > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Pan, Jacob jun
> > > > <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL
> > > > for IOMMU TLB invalidate propagation
> > > > 
> > > > On Fri, 14 Jul 2017 08:58:02 +0000
> > > > "Liu, Yi L" <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > > >     
> > > > > Hi Alex,
> > > > >
> > > > > Against to the opaque open, I'd like to propose the following
> > > > > definition based on the existing comments. Pls note that I've
> > > > > merged the pasid table binding and iommu tlb invalidation
> > > > > into a single IOCTL and make different flags to indicate the
> > > > > iommu operations. Per Kevin's comments, there may be iommu
> > > > > invalidation for guest IOVA tlb, so I renamed the IOCTL and
> > > > > data structure to be non-svm specific. Pls kindly have a
> > > > > review, so that we can make the opaque open closed and move
> > > > > forward. Surely, comments and ideas are welcomed. And for the
> > > > > scope and flags definition in struct iommu_tlb_invalidate,
> > > > > it's also welcomed to    
> > > > give your ideas on it.    
> > > > >
> > > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > > >
> > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > > >
> > > > > Corresponding data structure:
> > > > > struct vfio_iommu_operation_info {
> > > > > 	__u32	argsz;
> > > > > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind
> > > > > PASID Table */ #define VFIO_IOMMU_BIND_PASID	(1 <<
> > > > > 1) /* Bind PASID from userspace    
> > > > driver*/    
> > > > > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest
> > > > > mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB	(1 <<
> > > > > 3) /* Invalidate iommu tlb */ __u32	flag;
> > > > > 	__u32	length; // length of the data[] part in
> > > > > byte __u8	data[]; // stores the data for iommu op
> > > > > indicated by flag field };    
> > > > 
> > > > If we're doing a generic "Ops" ioctl, then we should have an
> > > > "op" field which is defined by an enum.  It doesn't make sense
> > > > to use flags for this, for example can we set multiple flag
> > > > bits?  If not then it's not a good use for a bit field.  I'm
> > > > also not sure I understand the value of the "length" field,
> > > > can't it always be calculated from argsz?    
> > > 
> > > Agreed, enum would be better. "length" field could be calculated
> > > from argsz. I used it just to avoid offset calculations. May
> > > remove it. 
> > > > > For iommu tlb invalidation from userspace, the "__u8 data[]"
> > > > > stores data which would be parsed by the "struct
> > > > > iommu_tlb_invalidate" defined below.
> > > > >
> > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added
> > > > > header file)
> > > > >
> > > > > /* IOMMU model definition for iommu operations from userspace
> > > > > */ enum iommu_model {
> > > > > 	INTLE_IOMMU,
> > > > > 	ARM_SMMU,
> > > > > 	AMD_IOMMU,
> > > > > 	SPAPR_IOMMU,
> > > > > 	S390_IOMMU,
> > > > > };
> > > > >
> > > > > struct iommu_tlb_invalidate {
> > > > > 	__u32	scope;
> > > > > /* pasid-selective invalidation described by @pasid */
> > > > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > > > /* address-selevtive invalidation described by (@vaddr,
> > > > > @size) */ #define IOMMU_INVALIDATE_VADDR	(1 << 1)    
> > > > 
> > > > Again, is a bit field appropriate here, can a user set both
> > > > bits?    
> > > 
> > > yes, user may set both bits. It would be invalidate address range
> > > which is tagged with a PASID value.
> > >   
> > > >     
> > > > > 	__u32	flags;
> > > > > /*  targets non-pasid mappings, @pasid is not valid */
> > > > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > > > > /* indicating that the pIOMMU doesn't need to invalidate
> > > > > 	all intermediate tables cached as part of the PTE for
> > > > > 	vaddr, only the last-level entry (pte). This is a
> > > > > hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF	(1 <<
> > > > > 1)    
> > > > 
> > > > Are we venturing into vendor specific attributes here?    
> > > 
> > > These two attributes are still in discussion. Jean and me synced
> > > several rounds. But lack of comments from other vendors.
> > > 
> > > Personally, I think both should be generic.
> > > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
> > > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
> > > only invalidate leaf mappings. 
> > > I would see if other vendor is object on it. If yes, I'm fine to
> > > move it to vendor specific part.
> > >    
> > > >     
> > > > > 	__u32	pasid;
> > > > > 	__u64	vaddr;
> > > > > 	__u64	size;
> > > > > 	enum iommu_model model;    
> > > > 
> > > > How does a user learn which model(s) are supported by the
> > > > interface? How do they learn which ops are supported?  Perhaps
> > > > a good use for one of those flag bits in the outer data
> > > > structure is "probe".    
> > > 
> > > My initial plan to user fills it, if the underlying HW doesn't
> > > support the model, it refuses to service it. User should get a
> > > failure and stop to use it. But your suggestion to have a probe
> > > or kinds of query makes sense. How about we add one more
> > > operation for such purpose? Besides the supported model query,
> > > I'd like to add more. E.g the HW IOMMU capabilities.  
> > 
> > We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> > for missing capabilities.  Depending on the capability you want to
> > describe, this might be a better, existing interface for it.
> >    
> > > > > 	/*
> > > > > 	 Vendor may have different HW version and thus the
> > > > > 	 data part of this structure differs, use sub_version
> > > > > 	 to indicate such difference.
> > > > > 	 */
> > > > > 	__u322 sub_version;    
> > > > 
> > > > Not sure I see the value of this vs creating an INTEL_IOMMUv2
> > > > entry in the model enum.    
> > > 
> > > Both are fine to me. Just see the opinions from other guys.
> > >   
> > > > > 	__u64 length; // length of the data[] part in byte    
> > > > 
> > > > Questionably useful vs calculating from argsz again , but it
> > > > certainly doesn't need to be a qword :-o    
> > > 
> > > Thx for the remind. 32bits would be enough. It is surely to get
> > > it from argsz. However, I would like to leave it here. Reason is:
> > > argsz is in vfio layer, the "length" here is actually used in
> > > vendor-specific iommu driver layer. So would require vfio to pass
> > > argsz or the size of " struct iommu_tlb_invalidate" to
> > > vendor-specific iommu driver layer by means of parameter or so.
> > > Personally, I prefer to pass it in the structure. If it's better
> > > to pass it as a parameter, I would do it.  
> > 
> > Ok, then the layer that does the copy_from_user will need to
> > validate that length is fully contained within the copied data
> > structure, we can't let the user trick the kernel into using kernel
> > memory for this.  
> 
> VFIO is still the layer which copy_from_user. would check the length.
> 
> >   
> > > > > 	__u8	data[];
> > > > > };
> > > > >
> > > > > For Intel, the data structue is:
> > > > > struct intel_iommu_invalidate_data {
> > > > > 	__u64 low;
> > > > > 	__u64 high;
> > > > > }    
> > > > 
> > > > high/low what?  This is a pretty weak uapi definition.
> > > > Thanks,    
> > > 
> > > For this part, for Intel platform, we plan to pass a 128 bit data
> > > for the invalidation. The structure varies from invalidation type
> > > to type. Here is my thought on it. Define an 128 bits union. List
> > > the invalidation data details for each invalidation type. What's
> > > your opinion on it? So far, we have 7 types for invalidation. The
> > > prq response is not included.  
> > 
> > I want this interface to be fully defined, but at the same time I
> > don't necessarily want to create useless data structures.  I
> > believe the intention here is to pass these directly through to a
> > QI entry, where  
> 
> yes, it's a QI entry from guest.
> 
> > we must match a hardware definition.  I'm tempted to suggest
> > referencing the hardware specification, but see below...
> > 
> > A concern for this model is that hardware may trust the iommu driver
> > not to create QI entries that don't set reserved bits or set invalid
> > field data.  If it does those kinds of things, it's a kernel driver
> > bug.  Once exposed to the user, we cannot guarantee that.  Does
> > Intel have confidence that a user cannot maliciously interfere with
> > other contexts or the general operation of the invalidation queue
> > if a user is effectively given direct access?  Will the
> > invalidation data be sanitized by the iommu driver?
> >    
> > > union intel_iommu_invalidate_data {
> > >  	struct {
> > > 		__u64 low;
> > >  		__u64 high;
> > > 	} invalidate_data;
> > > 
> > > 	struct {
> > > 		__u64 type: 4;
> > > 		__u64 gran: 2;
> > > 		__u64 rsv1: 10;
> > > 		__u64 did: 16;
> > > 		__u64 sid: 16;
> > > 		__u64 func_mask: 2;
> > > 		__u64 rsv2: 14;
> > > 		__64 rsv3: 64;
> > > 	} context_cache_inv;
> > > 	....  
> > 
> > Here's part of the issue with not fully defining these, we have did,
> > sid, and func_mask.  I think we're claiming that the benefit of
> > passing through the hardware data structure is performance, but the
> > user needs to replace these IDs to match the physical device rather
> > than the virtual device, perhaps even entirely recreating it
> > because there's not necessarily a 1:1 mapping of things like
> > func_mask between virtual and physical hardware topologies
> > (assuming I'm interpreting these fields correctly).  Doesn't the
> > kernel also need to validate any such field to prevent the user
> > spoofing entries for other devices?  Is there any actual
> > performance benefit remaining vs defining a generic interface after
> > multiple levels have manipulated, recreated, and sanitized these
> > structures?  We can't evaluate these sorts of risks if we don't
> > define what we're passing through.  Thanks, 
> 
> A potential proposal is to abstract the fields of the QI entry.
> However, here is a concern for it. Different type of QI entry would
> have diferent fields. It means we need to have a hyper set to include
> all the possible fields. Supposedly, the set would increase as more
> QI type is introduced. I'm not sure if it is an acceptable definition.
> 
> Based on the latest spec, the vendor-specific fields may have:
> 
> Global hint
> Drain read/write
> Source-ID
> MIP
> PFSID
> 
My thinking was that as long as the risk of having some opaque data is
limited to the device that is already exposed to the user space, it
should be fine. We have model specific IOMMU driver to sanitize the
data before putting the descriptor into hardware.

But I agree the overhead of disassemble/assemble may not be
significant. Though with vIOMMU and caching mode = 1 (requires
explicit invalidation of caches regardless present or not, VT-d spec
6.1), we will see more invalidation than the native pIOMMU case.

Anyway, we can do some micro benchmark to see the overhead.

> PRQ response is another topic. Not included here.
> 
> Thanks,
> Yi L
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
@ 2017-07-19 21:50                                               ` Jacob Pan
  0 siblings, 0 replies; 116+ messages in thread
From: Jacob Pan @ 2017-07-19 21:50 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Liu, Yi L, Lan, Tianyu, Tian, Kevin, Raj, Ashok,
	kvm, Jean-Philippe Brucker, jasowang, Will Deacon, qemu-devel,
	peterx, iommu, Joerg Roedel, jacob.jun.pan

On Wed, 19 Jul 2017 18:45:43 +0800
"Liu, Yi L" <yi.l.liu@linux.intel.com> wrote:

> On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> > On Mon, 17 Jul 2017 10:58:41 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > Hi Alex,
> > > 
> > > Pls refer to the response inline.
> > >   
> > > > -----Original Message-----
> > > > From: kvm-owner@vger.kernel.org
> > > > [mailto:kvm-owner@vger.kernel.org] On Behalf Of Alex Williamson
> > > > Sent: Saturday, July 15, 2017 2:16 AM
> > > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>;
> > > > Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L
> > > > <yi.l.liu@linux.intel.com>; Lan, Tianyu <tianyu.lan@intel.com>;
> > > > Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org;
> > > > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>;
> > > > peterx@redhat.com; qemu-devel@nongnu.org;
> > > > iommu@lists.linux-foundation.org; Pan, Jacob jun
> > > > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org>
> > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL
> > > > for IOMMU TLB invalidate propagation
> > > > 
> > > > On Fri, 14 Jul 2017 08:58:02 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >     
> > > > > Hi Alex,
> > > > >
> > > > > Against to the opaque open, I'd like to propose the following
> > > > > definition based on the existing comments. Pls note that I've
> > > > > merged the pasid table binding and iommu tlb invalidation
> > > > > into a single IOCTL and make different flags to indicate the
> > > > > iommu operations. Per Kevin's comments, there may be iommu
> > > > > invalidation for guest IOVA tlb, so I renamed the IOCTL and
> > > > > data structure to be non-svm specific. Pls kindly have a
> > > > > review, so that we can make the opaque open closed and move
> > > > > forward. Surely, comments and ideas are welcomed. And for the
> > > > > scope and flags definition in struct iommu_tlb_invalidate,
> > > > > it's also welcomed to    
> > > > give your ideas on it.    
> > > > >
> > > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > > >
> > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > > >
> > > > > Corresponding data structure:
> > > > > struct vfio_iommu_operation_info {
> > > > > 	__u32	argsz;
> > > > > #define VFIO_IOMMU_BIND_PASIDTBL	(1 << 0) /* Bind
> > > > > PASID Table */ #define VFIO_IOMMU_BIND_PASID	(1 <<
> > > > > 1) /* Bind PASID from userspace    
> > > > driver*/    
> > > > > #define VFIO_IOMMU_BIND_PGTABLE	(1 << 2) /* Bind guest
> > > > > mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB	(1 <<
> > > > > 3) /* Invalidate iommu tlb */ __u32	flag;
> > > > > 	__u32	length; // length of the data[] part in
> > > > > byte __u8	data[]; // stores the data for iommu op
> > > > > indicated by flag field };    
> > > > 
> > > > If we're doing a generic "Ops" ioctl, then we should have an
> > > > "op" field which is defined by an enum.  It doesn't make sense
> > > > to use flags for this, for example can we set multiple flag
> > > > bits?  If not then it's not a good use for a bit field.  I'm
> > > > also not sure I understand the value of the "length" field,
> > > > can't it always be calculated from argsz?    
> > > 
> > > Agreed, enum would be better. "length" field could be calculated
> > > from argsz. I used it just to avoid offset calculations. May
> > > remove it. 
> > > > > For iommu tlb invalidation from userspace, the "__u8 data[]"
> > > > > stores data which would be parsed by the "struct
> > > > > iommu_tlb_invalidate" defined below.
> > > > >
> > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added
> > > > > header file)
> > > > >
> > > > > /* IOMMU model definition for iommu operations from userspace
> > > > > */ enum iommu_model {
> > > > > 	INTLE_IOMMU,
> > > > > 	ARM_SMMU,
> > > > > 	AMD_IOMMU,
> > > > > 	SPAPR_IOMMU,
> > > > > 	S390_IOMMU,
> > > > > };
> > > > >
> > > > > struct iommu_tlb_invalidate {
> > > > > 	__u32	scope;
> > > > > /* pasid-selective invalidation described by @pasid */
> > > > > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > > > > /* address-selevtive invalidation described by (@vaddr,
> > > > > @size) */ #define IOMMU_INVALIDATE_VADDR	(1 << 1)    
> > > > 
> > > > Again, is a bit field appropriate here, can a user set both
> > > > bits?    
> > > 
> > > yes, user may set both bits. It would be invalidate address range
> > > which is tagged with a PASID value.
> > >   
> > > >     
> > > > > 	__u32	flags;
> > > > > /*  targets non-pasid mappings, @pasid is not valid */
> > > > > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> > > > > /* indicating that the pIOMMU doesn't need to invalidate
> > > > > 	all intermediate tables cached as part of the PTE for
> > > > > 	vaddr, only the last-level entry (pte). This is a
> > > > > hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF	(1 <<
> > > > > 1)    
> > > > 
> > > > Are we venturing into vendor specific attributes here?    
> > > 
> > > These two attributes are still in discussion. Jean and me synced
> > > several rounds. But lack of comments from other vendors.
> > > 
> > > Personally, I think both should be generic.
> > > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used
> > > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates
> > > only invalidate leaf mappings. 
> > > I would see if other vendor is object on it. If yes, I'm fine to
> > > move it to vendor specific part.
> > >    
> > > >     
> > > > > 	__u32	pasid;
> > > > > 	__u64	vaddr;
> > > > > 	__u64	size;
> > > > > 	enum iommu_model model;    
> > > > 
> > > > How does a user learn which model(s) are supported by the
> > > > interface? How do they learn which ops are supported?  Perhaps
> > > > a good use for one of those flag bits in the outer data
> > > > structure is "probe".    
> > > 
> > > My initial plan to user fills it, if the underlying HW doesn't
> > > support the model, it refuses to service it. User should get a
> > > failure and stop to use it. But your suggestion to have a probe
> > > or kinds of query makes sense. How about we add one more
> > > operation for such purpose? Besides the supported model query,
> > > I'd like to add more. E.g the HW IOMMU capabilities.  
> > 
> > We also have VFIO_IOMMU_GET_INFO where the structure can be extended
> > for missing capabilities.  Depending on the capability you want to
> > describe, this might be a better, existing interface for it.
> >    
> > > > > 	/*
> > > > > 	 Vendor may have different HW version and thus the
> > > > > 	 data part of this structure differs, use sub_version
> > > > > 	 to indicate such difference.
> > > > > 	 */
> > > > > 	__u322 sub_version;    
> > > > 
> > > > Not sure I see the value of this vs creating an INTEL_IOMMUv2
> > > > entry in the model enum.    
> > > 
> > > Both are fine to me. Just see the opinions from other guys.
> > >   
> > > > > 	__u64 length; // length of the data[] part in byte    
> > > > 
> > > > Questionably useful vs calculating from argsz again , but it
> > > > certainly doesn't need to be a qword :-o    
> > > 
> > > Thx for the remind. 32bits would be enough. It is surely to get
> > > it from argsz. However, I would like to leave it here. Reason is:
> > > argsz is in vfio layer, the "length" here is actually used in
> > > vendor-specific iommu driver layer. So would require vfio to pass
> > > argsz or the size of " struct iommu_tlb_invalidate" to
> > > vendor-specific iommu driver layer by means of parameter or so.
> > > Personally, I prefer to pass it in the structure. If it's better
> > > to pass it as a parameter, I would do it.  
> > 
> > Ok, then the layer that does the copy_from_user will need to
> > validate that length is fully contained within the copied data
> > structure, we can't let the user trick the kernel into using kernel
> > memory for this.  
> 
> VFIO is still the layer which copy_from_user. would check the length.
> 
> >   
> > > > > 	__u8	data[];
> > > > > };
> > > > >
> > > > > For Intel, the data structue is:
> > > > > struct intel_iommu_invalidate_data {
> > > > > 	__u64 low;
> > > > > 	__u64 high;
> > > > > }    
> > > > 
> > > > high/low what?  This is a pretty weak uapi definition.
> > > > Thanks,    
> > > 
> > > For this part, for Intel platform, we plan to pass a 128 bit data
> > > for the invalidation. The structure varies from invalidation type
> > > to type. Here is my thought on it. Define an 128 bits union. List
> > > the invalidation data details for each invalidation type. What's
> > > your opinion on it? So far, we have 7 types for invalidation. The
> > > prq response is not included.  
> > 
> > I want this interface to be fully defined, but at the same time I
> > don't necessarily want to create useless data structures.  I
> > believe the intention here is to pass these directly through to a
> > QI entry, where  
> 
> yes, it's a QI entry from guest.
> 
> > we must match a hardware definition.  I'm tempted to suggest
> > referencing the hardware specification, but see below...
> > 
> > A concern for this model is that hardware may trust the iommu driver
> > not to create QI entries that don't set reserved bits or set invalid
> > field data.  If it does those kinds of things, it's a kernel driver
> > bug.  Once exposed to the user, we cannot guarantee that.  Does
> > Intel have confidence that a user cannot maliciously interfere with
> > other contexts or the general operation of the invalidation queue
> > if a user is effectively given direct access?  Will the
> > invalidation data be sanitized by the iommu driver?
> >    
> > > union intel_iommu_invalidate_data {
> > >  	struct {
> > > 		__u64 low;
> > >  		__u64 high;
> > > 	} invalidate_data;
> > > 
> > > 	struct {
> > > 		__u64 type: 4;
> > > 		__u64 gran: 2;
> > > 		__u64 rsv1: 10;
> > > 		__u64 did: 16;
> > > 		__u64 sid: 16;
> > > 		__u64 func_mask: 2;
> > > 		__u64 rsv2: 14;
> > > 		__64 rsv3: 64;
> > > 	} context_cache_inv;
> > > 	....  
> > 
> > Here's part of the issue with not fully defining these, we have did,
> > sid, and func_mask.  I think we're claiming that the benefit of
> > passing through the hardware data structure is performance, but the
> > user needs to replace these IDs to match the physical device rather
> > than the virtual device, perhaps even entirely recreating it
> > because there's not necessarily a 1:1 mapping of things like
> > func_mask between virtual and physical hardware topologies
> > (assuming I'm interpreting these fields correctly).  Doesn't the
> > kernel also need to validate any such field to prevent the user
> > spoofing entries for other devices?  Is there any actual
> > performance benefit remaining vs defining a generic interface after
> > multiple levels have manipulated, recreated, and sanitized these
> > structures?  We can't evaluate these sorts of risks if we don't
> > define what we're passing through.  Thanks, 
> 
> A potential proposal is to abstract the fields of the QI entry.
> However, here is a concern for it. Different type of QI entry would
> have diferent fields. It means we need to have a hyper set to include
> all the possible fields. Supposedly, the set would increase as more
> QI type is introduced. I'm not sure if it is an acceptable definition.
> 
> Based on the latest spec, the vendor-specific fields may have:
> 
> Global hint
> Drain read/write
> Source-ID
> MIP
> PFSID
> 
My thinking was that as long as the risk of having some opaque data is
limited to the device that is already exposed to the user space, it
should be fine. We have model specific IOMMU driver to sanitize the
data before putting the descriptor into hardware.

But I agree the overhead of disassemble/assemble may not be
significant. Though with vIOMMU and caching mode = 1 (requires
explicit invalidation of caches regardless present or not, VT-d spec
6.1), we will see more invalidation than the native pIOMMU case.

Anyway, we can do some micro benchmark to see the overhead.

> PRQ response is another topic. Not included here.
> 
> Thanks,
> Yi L
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2017-07-19 21:50 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 10:11 [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d Liu, Yi L
2017-04-26 10:11 ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:11 ` [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function Liu, Yi L
2017-04-26 10:11   ` [Qemu-devel] " Liu, Yi L
2017-04-26 16:56   ` Jean-Philippe Brucker
2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
2017-04-27  6:36     ` Liu, Yi L
2017-04-27  6:36       ` [Qemu-devel] " Liu, Yi L
2017-04-27 10:12       ` Jean-Philippe Brucker
2017-04-27 10:12         ` [Qemu-devel] " Jean-Philippe Brucker
     [not found]         ` <772ca9de-50ba-a379-002d-5ff1f6a2e297-5wv7dgnIgG8@public.gmane.org>
2017-04-28  7:59           ` Liu, Yi L
2017-04-28  7:59             ` Liu, Yi L
     [not found]     ` <c042bf90-d48b-4ebf-c01a-fca7c4875277-5wv7dgnIgG8@public.gmane.org>
2017-04-26 18:29       ` jacob pan
2017-04-26 18:29         ` [Qemu-devel] " jacob pan
     [not found]         ` <20170426112948.00004520-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-04-26 18:59           ` Jean-Philippe Brucker
2017-04-26 18:59             ` [Qemu-devel] " Jean-Philippe Brucker
2017-04-28  9:04       ` Liu, Yi L
2017-04-28  9:04         ` Liu, Yi L
2017-04-28 12:51         ` Jean-Philippe Brucker
2017-04-28 12:51           ` Jean-Philippe Brucker
     [not found]           ` <3adb4e33-db96-4133-0510-412c3bfb24fe-5wv7dgnIgG8@public.gmane.org>
2017-05-23  7:50             ` Liu, Yi L
2017-05-23  7:50               ` Liu, Yi L
2017-05-25 12:33               ` Jean-Philippe Brucker
2017-05-25 12:33                 ` Jean-Philippe Brucker
     [not found]   ` <1493201525-14418-2-git-send-email-yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-12 21:59     ` Alex Williamson
2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
     [not found]       ` <20170512155914.73bad777-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-14 10:56         ` Liu, Yi L
2017-05-14 10:56           ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:11 ` [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function Liu, Yi L
2017-04-26 10:11   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201525-14418-3-git-send-email-yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-12 21:59     ` Alex Williamson
2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
     [not found]       ` <20170512155929.66809113-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-15 13:14         ` jacob pan
2017-05-15 13:14           ` [Qemu-devel] " jacob pan
2017-04-26 10:12 ` [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201525-14418-4-git-send-email-yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-12 21:59     ` Alex Williamson
2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
     [not found]       ` <20170512155924.755ee17f-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-17 10:23         ` Liu, Yi L
2017-05-17 10:23           ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:12 ` [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201525-14418-5-git-send-email-yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-12 21:59     ` Alex Williamson
2017-05-12 21:59       ` [Qemu-devel] " Alex Williamson
     [not found]       ` <20170512155918.5251fb94-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-17 10:24         ` Liu, Yi L
2017-05-17 10:24           ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:12 ` [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
2017-04-26 16:56   ` Jean-Philippe Brucker
2017-04-26 16:56     ` [Qemu-devel] " Jean-Philippe Brucker
2017-04-27  5:43     ` Liu, Yi L
     [not found]   ` <1493201525-14418-6-git-send-email-yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-11 10:29     ` Liu, Yi L
2017-05-11 10:29       ` Liu, Yi L
2017-05-12 21:58     ` Alex Williamson
2017-05-12 21:58       ` [Qemu-devel] " Alex Williamson
     [not found]       ` <20170512155851.627409ed-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-17 10:27         ` Liu, Yi L
2017-05-17 10:27           ` Liu, Yi L
     [not found]           ` <20170517102759.GF22110-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-05-18 11:29             ` Jean-Philippe Brucker
2017-05-18 11:29               ` Jean-Philippe Brucker
2017-04-26 10:12 ` [RFC PATCH 6/8] VFIO: do pasid table binding Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
2017-05-09  7:55   ` Xiao Guangrong
2017-05-09  7:55     ` [Qemu-devel] " Xiao Guangrong
2017-05-11 10:29     ` Liu, Yi L
2017-05-12 21:59   ` Alex Williamson
2017-05-12 21:59     ` [Qemu-devel] " Alex Williamson
2017-04-26 10:12 ` [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
2017-05-12 12:11   ` Jean-Philippe Brucker
2017-05-12 12:11     ` [Qemu-devel] " Jean-Philippe Brucker
     [not found]     ` <cc330a8f-e087-9b6f-2a40-38b58688d300-5wv7dgnIgG8@public.gmane.org>
2017-05-14 10:12       ` Liu, Yi L
2017-05-14 10:12         ` [Qemu-devel] " Liu, Yi L
2017-05-15 12:14         ` Jean-Philippe Brucker
2017-05-15 12:14           ` [Qemu-devel] " Jean-Philippe Brucker
2017-07-02 10:06       ` Liu, Yi L
2017-07-02 10:06         ` Liu, Yi L
2017-07-03 11:52         ` Jean-Philippe Brucker
     [not found]           ` <0e4f2dd4-d553-b1b7-7bec-fe0ff5242c54-5wv7dgnIgG8@public.gmane.org>
2017-07-03 10:31             ` Liu, Yi L
2017-07-03 10:31               ` Liu, Yi L
     [not found]               ` <20170703103115.GB22053-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-07-05  6:45                 ` Tian, Kevin
2017-07-05  6:45                   ` Tian, Kevin
     [not found]                   ` <AADFC41AFE54684AB9EE6CBC0274A5D190D25919-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-07-05 12:42                     ` Jean-Philippe Brucker
2017-07-05 12:42                       ` Jean-Philippe Brucker
     [not found]                       ` <1d63c1ae-ca10-0f9d-91de-0d9c9823c104-5wv7dgnIgG8@public.gmane.org>
2017-07-05 17:28                         ` Alex Williamson
2017-07-05 17:28                           ` Alex Williamson
     [not found]                           ` <20170705112816.56554f65-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-07-05 22:26                             ` Tian, Kevin
2017-07-05 22:26                               ` Tian, Kevin
2017-07-14  8:58                             ` Liu, Yi L
2017-07-14  8:58                               ` Liu, Yi L
     [not found]                               ` <A2975661238FB949B60364EF0F2C2574390A7C4F-E2R4CRU6q/6iAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-07-14 18:15                                 ` Alex Williamson
2017-07-14 18:15                                   ` Alex Williamson
     [not found]                                   ` <20170714121555.7e64d849-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-07-17 10:58                                     ` Liu, Yi L
2017-07-17 10:58                                       ` Liu, Yi L
2017-07-17 22:45                                       ` Alex Williamson
     [not found]                                         ` <20170717164515.2491b3bf-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-07-18  9:38                                           ` Jean-Philippe Brucker
2017-07-18  9:38                                             ` Jean-Philippe Brucker
     [not found]                                             ` <d0abeefc-adcf-85c3-f5d9-8c90a18f8011-5wv7dgnIgG8@public.gmane.org>
2017-07-18 14:29                                               ` Alex Williamson
2017-07-18 14:29                                                 ` Alex Williamson
2017-07-18 15:03                                                 ` Jean-Philippe Brucker
2017-07-19 10:45                                           ` Liu, Yi L
2017-07-19 10:45                                             ` Liu, Yi L
2017-07-19 21:50                                             ` Jacob Pan
2017-07-19 21:50                                               ` Jacob Pan
2017-07-05 22:31                         ` Tian, Kevin
2017-07-05 22:31                           ` Tian, Kevin
2017-05-12 21:58   ` Alex Williamson
2017-05-12 21:58     ` [Qemu-devel] " Alex Williamson
2017-05-14 10:55     ` Liu, Yi L
2017-05-14 10:55       ` [Qemu-devel] " Liu, Yi L
     [not found]       ` <20170514105507.GB22110-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-07-05  5:32         ` Tian, Kevin
2017-07-05  5:32           ` [Qemu-devel] " Tian, Kevin
2017-04-26 10:12 ` [RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest Liu, Yi L
2017-04-26 10:12   ` [Qemu-devel] " Liu, Yi L
2017-05-08  4:09 ` [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d Xiao Guangrong
2017-05-08  4:09   ` [Qemu-devel] " Xiao Guangrong
2017-05-07  7:33   ` Liu, Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.