All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-06-27 19:47 ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

Shared virtual memory (SVM) space between devices and applications can
reduce programming complexity and enhance security. To enable SVM in
the guest, i.e. share guest application address space with physical
device DMA address, IOMMU driver must provide some new functionalities.

The complete guest SVM support also involves changes in QEMU and VFIO,
which has been posted earlier.
https://www.spinics.net/lists/kvm/msg148798.html

This is the IOMMU portion follow up of the more complete series of the
kernel changes to support SVM. Please refer to the link below for more
details. https://www.spinics.net/lists/kvm/msg148819.html

Generic APIs are introduced in addition to Intel VT-d specific changes,
the goal is to have common interfaces across IOMMU and device types for
both VFIO and other in-kernel users.

At the top level, three new IOMMU interfaces are introduced:
 - bind PASID table
 - passdown invalidation
 - per device IOMMU fault notification

The additional patches are Intel VT-d specific, which either implements or
replaces existing private interfaces with the generic ones.

Thanks,

Jacob


Jacob Pan (8):
  iommu: Introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: Add iommu do invalidate function
  iommu: Introduce fault notifier API
  iommu/vt-d: track device with pasid table bond to a guest
  iommu/dmar: notify unrecoverable faults
  iommu/intel-svm: notify page request to guest
  iommu/intel-svm: replace dev ops with generic fault notifier

Liu, Yi L (1):
  iommu: Introduce iommu do invalidate API function

 drivers/iommu/dmar.c          |  38 ++++++++-
 drivers/iommu/intel-iommu.c   | 177 +++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/intel-svm.c     | 102 +++++++++++++++++++++---
 drivers/iommu/iommu.c         |  77 ++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 include/linux/intel-iommu.h   |  30 ++++++-
 include/linux/intel-svm.h     |  20 +----
 include/linux/iommu.h         |  59 ++++++++++++++
 include/uapi/linux/iommu.h    |  85 ++++++++++++++++++++
 9 files changed, 540 insertions(+), 49 deletions(-)
 create mode 100644 include/uapi/linux/iommu.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-06-27 19:47 ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	Joerg Roedel, David Woodhouse
  Cc: Lan Tianyu, Tian, Kevin, Jean Delvare

Shared virtual memory (SVM) space between devices and applications can
reduce programming complexity and enhance security. To enable SVM in
the guest, i.e. share guest application address space with physical
device DMA address, IOMMU driver must provide some new functionalities.

The complete guest SVM support also involves changes in QEMU and VFIO,
which has been posted earlier.
https://www.spinics.net/lists/kvm/msg148798.html

This is the IOMMU portion follow up of the more complete series of the
kernel changes to support SVM. Please refer to the link below for more
details. https://www.spinics.net/lists/kvm/msg148819.html

Generic APIs are introduced in addition to Intel VT-d specific changes,
the goal is to have common interfaces across IOMMU and device types for
both VFIO and other in-kernel users.

At the top level, three new IOMMU interfaces are introduced:
 - bind PASID table
 - passdown invalidation
 - per device IOMMU fault notification

The additional patches are Intel VT-d specific, which either implements or
replaces existing private interfaces with the generic ones.

Thanks,

Jacob


Jacob Pan (8):
  iommu: Introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: Add iommu do invalidate function
  iommu: Introduce fault notifier API
  iommu/vt-d: track device with pasid table bond to a guest
  iommu/dmar: notify unrecoverable faults
  iommu/intel-svm: notify page request to guest
  iommu/intel-svm: replace dev ops with generic fault notifier

Liu, Yi L (1):
  iommu: Introduce iommu do invalidate API function

 drivers/iommu/dmar.c          |  38 ++++++++-
 drivers/iommu/intel-iommu.c   | 177 +++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/intel-svm.c     | 102 +++++++++++++++++++++---
 drivers/iommu/iommu.c         |  77 ++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 include/linux/intel-iommu.h   |  30 ++++++-
 include/linux/intel-svm.h     |  20 +----
 include/linux/iommu.h         |  59 ++++++++++++++
 include/uapi/linux/iommu.h    |  85 ++++++++++++++++++++
 9 files changed, 540 insertions(+), 49 deletions(-)
 create mode 100644 include/uapi/linux/iommu.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/9] iommu: Introduce bind_pasid_table API function
  2017-06-27 19:47 ` Jacob Pan
  (?)
@ 2017-06-27 19:47 ` Jacob Pan
  2017-06-28  9:57     ` Joerg Roedel
  -1 siblings, 1 reply; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan, Liu, Yi L

Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
case in the guest:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

As part of the proposed architecture, when a SVM capable PCI
device is assigned to a guest, nested mode is turned on. Guest owns the
first level page tables (request with PASID) and performs GVA->GPA
translation. Second level page tables are owned by the host for GPA->HPA
translation for both request with and without PASID.

A new IOMMU driver interface is therefore needed to perform tasks as
follows:
* Enable nested translation and appropriate translation type
* Assign guest PASID table pointer (in GPA) and size to host IOMMU

This patch introduces new functions called iommu_(un)bind_pasid_table()
to IOMMU APIs. Architecture specific IOMMU function can be added later
to perform the specific steps for binding pasid table of assigned devices.

This patch also adds model definition in iommu.h. It would be used to
check if the bind request is from a compatible entity. e.g. a bind
request from an intel_iommu emulator may not be supported by an ARM SMMU
driver.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/iommu.c      | 19 +++++++++++++++++++
 include/linux/iommu.h      | 23 +++++++++++++++++++++++
 include/uapi/linux/iommu.h | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)
 create mode 100644 include/uapi/linux/iommu.h

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf7ca7e..494309b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1328,6 +1328,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	if (unlikely(!domain->ops->bind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
+
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	if (unlikely(!domain->ops->unbind_pasid_table))
+		return -EINVAL;
+
+	return domain->ops->unbind_pasid_table(domain, dev);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54ad..7122add 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <uapi/linux/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -183,6 +184,8 @@ struct iommu_resv_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @bind_pasid_table: bind pasid table pointer for guest SVM
+ * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -225,6 +228,10 @@ struct iommu_ops {
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
 
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+	int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
+				struct pasid_table_info *pasidt_binfo);
+	int (*unbind_pasid_table)(struct iommu_domain *domain,
+				struct device *dev);
 
 	unsigned long pgsize_bitmap;
 };
@@ -282,6 +289,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo);
+extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
+				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -637,6 +648,18 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+			struct pasid_table_info *pasidt_binfo)
+{
+	return -EINVAL;
+}
+static inline
+int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index 0000000..9089a30
--- /dev/null
+++ b/include/uapi/linux/iommu.h
@@ -0,0 +1,38 @@
+/*
+ * IOMMU user API definitions
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include <linux/types.h>
+
+enum iommu_model {
+	IOMMU_MODEL_INTEL_VTD,
+	IOMMU_MODEL_ARM_SMMU,
+};
+
+/*
+ * PASID table data used to bind guest PASID table to the host IOMMU. This will
+ * enable guest managed first level page tables.
+ * @ptr		PASID table pointer
+ * @size	size of the guest PASID table in bytes, must be <= host table size
+ * @model	iommu_model number
+ * @length	length of the opaque data in bytes
+ * @opaque	model specific IOMMU data
+ */
+struct pasid_table_info {
+	__u64	ptr;
+	__u64	size;
+	__u32	model;
+	__u32	length;
+	__u8	opaque[];
+};
+
+
+#endif /* _UAPI_IOMMU_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-06-27 19:47   ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan, Liu, Yi L

Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/intel-iommu.c   | 117 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 118 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8274ce3..ef05b59 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5430,6 +5430,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 
 	return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo)
+{
+	struct intel_iommu *iommu;
+	struct context_entry *context;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	struct pci_dev *pdev;
+	u8 bus, devfn;
+	u16 did, *sid;
+	int ret = 0;
+	unsigned long flags;
+	u64 ctx_lo;
+
+	if (pasidt_binfo == NULL || pasidt_binfo->model != IOMMU_MODEL_INTEL_VTD) {
+		pr_warn("%s: Invalid bind request!\n", __func__);
+		return -EINVAL;
+	}
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	sid = (u16 *)&pasidt_binfo->opaque;
+	/*
+	 * check SID, if it is not correct, return success to allow looping
+	 * through all devices within a group
+	 */
+	if (PCI_DEVID(bus, devfn) != *sid)
+		return 0;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	pdev = to_pci_dev(dev);
+	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
+		return -EINVAL;
+
+	info = dev->archdata.iommu;
+	if (!info || !info->pasid_supported ||
+		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
+		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
+			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
+			       PCI_FUNC(devfn));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (pasidt_binfo->size > intel_iommu_get_pts(iommu)) {
+		pr_err("Invalid gPASID table size %llu, host size %lu\n",
+			pasidt_binfo->size,
+			intel_iommu_get_pts(iommu));
+		ret = -EINVAL;
+		goto out;
+	}
+	spin_lock_irqsave(&iommu->lock, flags);
+	context = iommu_context_addr(iommu, bus, devfn, 0);
+	if (!context || !context_present(context)) {
+		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
+			__func__, bus, devfn);
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		goto out;
+	}
+
+	/* Anticipate guest to use SVM and owns the first level */
+	ctx_lo = context[0].lo;
+	ctx_lo |= CONTEXT_NESTE;
+	ctx_lo |= CONTEXT_PRS;
+	ctx_lo |= CONTEXT_PASIDE;
+	ctx_lo &= ~CONTEXT_TT_MASK;
+	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+	context[0].lo = ctx_lo;
+
+	/* Assign guest PASID table pointer and size */
+	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
+	context[1].lo = ctx_lo;
+	/* make sure context entry is updated before flushing */
+	wmb();
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	iommu->flush.flush_context(iommu, did,
+				(((u16)bus) << 8) | devfn,
+				DMA_CCMD_MASK_NOBIT,
+				DMA_CCMD_DEVICE_INVL);
+	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+
+out:
+	return ret;
+}
+
+static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u8 bus, devfn;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+	/*
+	 * REVISIT: we might want to clear the PASID table pointer
+	 * as part of context clear operation. Currently, it leaves
+	 * stale data but should be ignored by hardware since PASIDE
+	 * is clear.
+	 */
+	/* ATS will be reenabled when remapping is restored */
+	pci_disable_ats(to_pci_dev(dev));
+	domain_context_clear(iommu, dev);
+	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
+}
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 const struct iommu_ops intel_iommu_ops = {
@@ -5438,6 +5551,10 @@ const struct iommu_ops intel_iommu_ops = {
 	.domain_free		= intel_iommu_domain_free,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.bind_pasid_table	= intel_iommu_bind_pasid_table,
+	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+#endif
 	.map			= intel_iommu_map,
 	.unmap			= intel_iommu_unmap,
 	.map_sg			= default_iommu_map_sg,
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 9088407..85367b7 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -27,6 +27,7 @@
 
 #define CONTEXT_DINVE		(1ULL << 8)
 #define CONTEXT_PRS		(1ULL << 9)
+#define CONTEXT_NESTE		(1ULL << 10)
 #define CONTEXT_PASIDE		(1ULL << 11)
 
 struct intel_iommu;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-06-27 19:47   ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	Joerg Roedel, David Woodhouse
  Cc: Lan Tianyu, Yi L, Tian,  Kevin,
	Liu-i9wRM+HIrmnmtl4Z8vJ8Kg761KYD1DLY, Jean Delvare

Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Signed-off-by: Ashok Raj <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/iommu/intel-iommu.c   | 117 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 118 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8274ce3..ef05b59 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5430,6 +5430,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
 
 	return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+		struct device *dev, struct pasid_table_info *pasidt_binfo)
+{
+	struct intel_iommu *iommu;
+	struct context_entry *context;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	struct pci_dev *pdev;
+	u8 bus, devfn;
+	u16 did, *sid;
+	int ret = 0;
+	unsigned long flags;
+	u64 ctx_lo;
+
+	if (pasidt_binfo == NULL || pasidt_binfo->model != IOMMU_MODEL_INTEL_VTD) {
+		pr_warn("%s: Invalid bind request!\n", __func__);
+		return -EINVAL;
+	}
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	sid = (u16 *)&pasidt_binfo->opaque;
+	/*
+	 * check SID, if it is not correct, return success to allow looping
+	 * through all devices within a group
+	 */
+	if (PCI_DEVID(bus, devfn) != *sid)
+		return 0;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	pdev = to_pci_dev(dev);
+	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
+		return -EINVAL;
+
+	info = dev->archdata.iommu;
+	if (!info || !info->pasid_supported ||
+		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
+		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
+			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
+			       PCI_FUNC(devfn));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (pasidt_binfo->size > intel_iommu_get_pts(iommu)) {
+		pr_err("Invalid gPASID table size %llu, host size %lu\n",
+			pasidt_binfo->size,
+			intel_iommu_get_pts(iommu));
+		ret = -EINVAL;
+		goto out;
+	}
+	spin_lock_irqsave(&iommu->lock, flags);
+	context = iommu_context_addr(iommu, bus, devfn, 0);
+	if (!context || !context_present(context)) {
+		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
+			__func__, bus, devfn);
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		goto out;
+	}
+
+	/* Anticipate guest to use SVM and owns the first level */
+	ctx_lo = context[0].lo;
+	ctx_lo |= CONTEXT_NESTE;
+	ctx_lo |= CONTEXT_PRS;
+	ctx_lo |= CONTEXT_PASIDE;
+	ctx_lo &= ~CONTEXT_TT_MASK;
+	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+	context[0].lo = ctx_lo;
+
+	/* Assign guest PASID table pointer and size */
+	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
+	context[1].lo = ctx_lo;
+	/* make sure context entry is updated before flushing */
+	wmb();
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	iommu->flush.flush_context(iommu, did,
+				(((u16)bus) << 8) | devfn,
+				DMA_CCMD_MASK_NOBIT,
+				DMA_CCMD_DEVICE_INVL);
+	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+
+out:
+	return ret;
+}
+
+static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u8 bus, devfn;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+	/*
+	 * REVISIT: we might want to clear the PASID table pointer
+	 * as part of context clear operation. Currently, it leaves
+	 * stale data but should be ignored by hardware since PASIDE
+	 * is clear.
+	 */
+	/* ATS will be reenabled when remapping is restored */
+	pci_disable_ats(to_pci_dev(dev));
+	domain_context_clear(iommu, dev);
+	return domain_context_mapping_one(dmar_domain, iommu, bus, devfn);
+}
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 const struct iommu_ops intel_iommu_ops = {
@@ -5438,6 +5551,10 @@ const struct iommu_ops intel_iommu_ops = {
 	.domain_free		= intel_iommu_domain_free,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.bind_pasid_table	= intel_iommu_bind_pasid_table,
+	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+#endif
 	.map			= intel_iommu_map,
 	.unmap			= intel_iommu_unmap,
 	.map_sg			= default_iommu_map_sg,
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 9088407..85367b7 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -27,6 +27,7 @@
 
 #define CONTEXT_DINVE		(1ULL << 8)
 #define CONTEXT_PRS		(1ULL << 9)
+#define CONTEXT_NESTE		(1ULL << 10)
 #define CONTEXT_PASIDE		(1ULL << 11)
 
 struct intel_iommu;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/9] iommu: Introduce iommu do invalidate API function
  2017-06-27 19:47 ` Jacob Pan
                   ` (2 preceding siblings ...)
  (?)
@ 2017-06-27 19:47 ` Jacob Pan
  2017-06-28 10:08   ` Joerg Roedel
  -1 siblings, 1 reply; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Liu, Yi L, Liu, Jacob Pan

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

When a SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before pass them down the
host/physical IOMMU. There are IOMMU architectural specific actions
need to be taken which requires the generic APIs introduced in this
patch to have opaque data in the tlb_invalidate_info argument.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/iommu.c      | 14 ++++++++++++++
 include/linux/iommu.h      | 12 ++++++++++++
 include/uapi/linux/iommu.h | 13 +++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 494309b..d973555 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1347,6 +1347,20 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	int ret = 0;
+
+	if (unlikely(!domain->ops->invalidate))
+		return -ENODEV;
+
+	ret = domain->ops->invalidate(domain, dev, inv_info);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7122add..fbc08ae 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -186,6 +186,7 @@ struct iommu_resv_region {
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @bind_pasid_table: bind pasid table pointer for guest SVM
  * @unbind_pasid_table: unbind pasid table pointer and restore defaults
+ * @invalidate: invalidate translation caches
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -232,6 +233,8 @@ struct iommu_ops {
 				struct pasid_table_info *pasidt_binfo);
 	int (*unbind_pasid_table)(struct iommu_domain *domain,
 				struct device *dev);
+	int (*invalidate)(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
 
 	unsigned long pgsize_bitmap;
 };
@@ -293,6 +296,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain,
 		struct device *dev, struct pasid_table_info *pasidt_binfo);
 extern int iommu_unbind_pasid_table(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
@@ -660,6 +666,12 @@ int iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
 	return -EINVAL;
 }
 
+static inline int iommu_do_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 9089a30..f077353 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -34,5 +34,18 @@ struct pasid_table_info {
 	__u8	opaque[];
 };
 
+/*
+ * Translation cache invalidation information, contains IOMMU model specific
+ * data which can be parsed based on model ID by model specific drivers.
+ *
+ * @model	iommu_model number
+ * @length	length of the opaque data in bytes
+ * @opaque	model specific IOMMU data
+ */
+struct tlb_invalidate_info {
+	__u32	model;
+	__u32	length;
+	__u8	opaque[];
+};
 
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/9] iommu/vt-d: Add iommu do invalidate function
  2017-06-27 19:47 ` Jacob Pan
                   ` (3 preceding siblings ...)
  (?)
@ 2017-06-27 19:47 ` Jacob Pan
  -1 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan, Liu, Yi L

This patch adds Intel VT-d specific function to implement
iommu_do_invalidate API.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/intel-iommu.c | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h | 11 ++++++++++-
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef05b59..242bb8c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5127,6 +5127,46 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
 	dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
 }
 
+static int intel_iommu_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+	struct intel_iommu *iommu;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_invalidate_data *inv_data;
+	struct qi_desc *qi;
+	u16 did;
+	u8 bus, devfn;
+
+	if (!inv_info || !dmar_domain || (inv_info->model != IOMMU_MODEL_INTEL_VTD))
+		return -EINVAL;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
+
+	/* check SID */
+	if (PCI_DEVID(bus, devfn) != inv_data->sid)
+		return 0;
+
+	qi = &inv_data->inv_desc;
+
+	switch (qi->low & QI_TYPE_MASK) {
+	case QI_DIOTLB_TYPE:
+	case QI_DEIOTLB_TYPE:
+		/* for device IOTLB, we just let it pass through */
+		break;
+	default:
+		did = dmar_domain->iommu_did[iommu->seq_id];
+		qi->low &= ~QI_DID_MASK;
+		qi->low |= QI_DID(did);
+		break;
+	}
+
+	return qi_submit_sync(qi, iommu);
+}
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot)
@@ -5554,6 +5594,7 @@ const struct iommu_ops intel_iommu_ops = {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.bind_pasid_table	= intel_iommu_bind_pasid_table,
 	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
+	.invalidate		= intel_iommu_invalidate,
 #endif
 	.map			= intel_iommu_map,
 	.unmap			= intel_iommu_unmap,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 485a5b4..8df6c91 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -31,7 +31,6 @@
 #include <linux/list.h>
 #include <linux/iommu.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
-
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
 
@@ -258,6 +257,10 @@ enum {
 #define QI_PGRP_RESP_TYPE	0x9
 #define QI_PSTRM_RESP_TYPE	0xa
 
+#define QI_DID(did)		(((u64)did & 0xffff) << 16)
+#define QI_DID_MASK		GENMASK(31, 16)
+#define QI_TYPE_MASK		GENMASK(3, 0)
+
 #define QI_IEC_SELECTIVE	(((u64)1) << 4)
 #define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
 #define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
@@ -489,6 +492,12 @@ extern int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
 #endif
 
+struct intel_invalidate_data {
+	u16 sid;
+	u32 pasid;
+	struct qi_desc inv_desc;
+};
+
 extern const struct attribute_group *intel_iommu_groups[];
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/9] iommu: Introduce fault notifier API
  2017-06-27 19:47 ` Jacob Pan
                   ` (4 preceding siblings ...)
  (?)
@ 2017-06-27 19:47 ` Jacob Pan
  2017-06-28 10:16     ` Joerg Roedel
  -1 siblings, 1 reply; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:47 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

Faults detected by IOMMU is based on the transaction's source ID which
can be reported at per device basis, regardless of the device type is a
PCI device or not.

The fault types includes recoverable (e.g. page request) and
unrecoverable faults(e.g. invalid context). In most cases, faults can be
handled by IOMMU drivers. However, there are use cases that require
fault processing outside IOMMU driver, e.g.

1. page request fault originated from an SVM capable device that is
assigned to guest via vIOMMU. In this case, the first level page tables
are owned by the guest. Page request must be propagated to the guest to
let guest OS fault in the pages then send page response. In this
mechanism, the direct receiver of IOMMU fault notification is VFIO,
which can relay notification events to QEMU or other user space
software.

2. faults need more subtle handling by device drivers. Other than
simply invoke reset function, there are needs to let device driver
handle the fault with a smaller impact.

This patchset is intended to create a generic fault notification API such
that it can scale as follows:
- all IOMMU types
- PCI and non-PCI devices
- recoverable and unrecoverable faults
- VFIO and other other in kernel users
- DMA & IRQ remapping (TBD)

The event data contains both generic and raw architectural data
such that performance is not compromised as the data propagation may
involve many layers.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/iommu.c      | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      | 23 +++++++++++++++++++++++
 include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d973555..07cfd92 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,7 @@ struct iommu_group {
 	struct list_head devices;
 	struct mutex mutex;
 	struct blocking_notifier_head notifier;
+	struct blocking_notifier_head fault_notifier;
 	void *iommu_data;
 	void (*iommu_data_release)(void *iommu_data);
 	char *name;
@@ -345,6 +346,7 @@ struct iommu_group *iommu_group_alloc(void)
 	mutex_init(&group->mutex);
 	INIT_LIST_HEAD(&group->devices);
 	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
+	BLOCKING_INIT_NOTIFIER_HEAD(&group->fault_notifier);
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -790,6 +792,48 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
 /**
+ * iommu_register_fault_notifier - Register a notifier for fault reporting
+ * @group: device's iommu group to notify fault events
+ * @nb: notifier block to signal
+ *
+ */
+int iommu_register_fault_notifier(struct iommu_group *group,
+				struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&group->fault_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(iommu_register_fault_notifier);
+
+/**
+ * iommu_unregister_fault_notifier - Unregister a notifier for fault reporting
+ * @domain: the domain to watch
+ * @nb: notifier block to signal
+ *
+ */
+int iommu_unregister_fault_notifier(struct iommu_group *group,
+				  struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&group->fault_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_fault_notifier);
+
+int iommu_fault_notifier_call_chain(struct iommu_fault_event *event)
+{
+	int ret;
+	struct iommu_group *group = iommu_group_get(event->dev);
+
+	if (!group)
+		return -EINVAL;
+	/* caller provide generic data related to the event, TBD */
+	ret = (blocking_notifier_call_chain(&group->fault_notifier, 0, (void *)event)
+		== NOTIFY_BAD) ? -EINVAL : 0;
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL(iommu_fault_notifier_call_chain);
+
+/**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
  *
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fbc08ae..ed2f804 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -342,6 +342,12 @@ extern int iommu_group_register_notifier(struct iommu_group *group,
 					 struct notifier_block *nb);
 extern int iommu_group_unregister_notifier(struct iommu_group *group,
 					   struct notifier_block *nb);
+extern int iommu_register_fault_notifier(struct iommu_group *group,
+					 struct notifier_block *nb);
+extern int iommu_unregister_fault_notifier(struct iommu_group *group,
+					 struct notifier_block *nb);
+extern int iommu_fault_notifier_call_chain(struct iommu_fault_event *event);
+
 extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
@@ -574,6 +580,23 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
 	return 0;
 }
 
+static inline int iommu_register_fault_notifier(struct device *dev,
+						  struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline int iommu_unregister_fault_notifier(struct device *dev,
+						  struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline int iommu_fault_notifier_call_chain(struct iommu_fault_event *event)
+{
+	return 0;
+}
+
 static inline int iommu_group_id(struct iommu_group *group)
 {
 	return -ENODEV;
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index f077353..a8e3d7f 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -48,4 +48,38 @@ struct tlb_invalidate_info {
 	__u8	opaque[];
 };
 
+/*
+ * Generic fault event notification data, used by all IOMMU models
+ *
+ * - PCI and non-PCI devices
+ * - Recoverable faults (e.g. page request) & un-recoverable faults
+ * - DMA remapping and IRQ remapping faults
+ *
+ * @dev The device which faults are reported by IOMMU
+ * @addr tells the offending address
+ * @pasid contains process address space ID, used in shared virtual memory (SVM)
+ * @prot page access protection flag, e.g. IOMMU_READ, IOMMU_WRITE
+ * @flags contains fault type, etc.
+ * @length tells the size of the buf in bytes
+ * @buf contains any raw or arch specific data
+ *
+ */
+struct iommu_fault_event {
+	struct device *dev;
+	__u64 addr;
+	__u32 pasid;
+	__u32 prot;
+	__u32 flags;
+/* page request as result of recoverable translation fault */
+#define IOMMU_FAULT_PAGE_REQ	BIT(0)
+/* unrecoverable fault, e.g. invalid device context  */
+#define IOMMU_FAULT_UNRECOV	BIT(1)
+/* unrecoverable fault related to interrupt remapping */
+#define IOMMU_FAULT_IRQ_REMAP	BIT(2)
+/* unrecoverable fault on invalidation of translation caches */
+#define IOMMU_FAULT_INVAL	BIT(3)
+	__u32 length;
+	__u8  buf[];
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/9] iommu/vt-d: track device with pasid table bond to a guest
  2017-06-27 19:47 ` Jacob Pan
                   ` (5 preceding siblings ...)
  (?)
@ 2017-06-27 19:48 ` Jacob Pan
  -1 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:48 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

When PASID table pointer of an assigned device is bond to a guest,
the first level page tables are managed by the guest. However, only
host/physical IOMMU can detect fault events, e.g. page requests.
Therefore, we need to keep track of which device has its PASID table
pointer bond to a guest such that page request and other events can
be propagated to the guest as needed.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/intel-iommu.c | 19 +------------------
 include/linux/intel-iommu.h | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 242bb8c..d911d47 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -416,24 +416,6 @@ struct dmar_domain {
 					   iommu core */
 };
 
-/* PCI domain-device relationship */
-struct device_domain_info {
-	struct list_head link;	/* link to domain siblings */
-	struct list_head global; /* link to global list */
-	u8 bus;			/* PCI bus number */
-	u8 devfn;		/* PCI devfn number */
-	u8 pasid_supported:3;
-	u8 pasid_enabled:1;
-	u8 pri_supported:1;
-	u8 pri_enabled:1;
-	u8 ats_supported:1;
-	u8 ats_enabled:1;
-	u8 ats_qdep;
-	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
-	struct intel_iommu *iommu; /* IOMMU used by this device */
-	struct dmar_domain *domain; /* pointer to domain */
-};
-
 struct dmar_rmrr_unit {
 	struct list_head list;		/* list of rmrr units	*/
 	struct acpi_dmar_header *hdr;	/* ACPI header		*/
@@ -5555,6 +5537,7 @@ static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
 				DMA_CCMD_MASK_NOBIT,
 				DMA_CCMD_DEVICE_INVL);
 	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	info->pasid_tbl_bound = 1;
 	spin_unlock_irqrestore(&iommu->lock, flags);
 
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8df6c91..61f81ab 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -434,6 +434,25 @@ struct intel_iommu {
 	u32		flags;      /* Software defined flags */
 };
 
+/* PCI domain-device relationship */
+struct device_domain_info {
+	struct list_head link;	/* link to domain siblings */
+	struct list_head global; /* link to global list */
+	u8 bus;			/* PCI bus number */
+	u8 devfn;		/* PCI devfn number */
+	u8 pasid_supported:3;
+	u8 pasid_enabled:1;
+	u8 pasid_tbl_bound:1;	/* bound to guest PASID table */
+	u8 pri_supported:1;
+	u8 pri_enabled:1;
+	u8 ats_supported:1;
+	u8 ats_enabled:1;
+	u8 ats_qdep;
+	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
+	struct intel_iommu *iommu; /* IOMMU used by this device */
+	struct dmar_domain *domain; /* pointer to domain */
+};
+
 static inline void __iommu_flush_cache(
 	struct intel_iommu *iommu, void *addr, int size)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/9] iommu/dmar: notify unrecoverable faults
  2017-06-27 19:47 ` Jacob Pan
                   ` (6 preceding siblings ...)
  (?)
@ 2017-06-27 19:48 ` Jacob Pan
  -1 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:48 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

Currently, when device DMA faults are detected by IOMMU the fault
reasons are printed but the offending device is not notified.
This patch allows device drivers to be optionally notified for fault
conditions when device specific handling is needed for more subtle
processing, e.g. request with PASID transactions.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/dmar.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index cbf7763..459a472 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1587,6 +1587,39 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
 	raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
 }
 
+static int dmar_unrecov_fault_notify(u8 fault_reason, u16 source_id,
+			unsigned long long addr)
+{
+	int ret;
+	struct pci_dev *pdev;
+	struct iommu_fault_event *event;
+
+	pdev = pci_get_bus_and_slot(source_id >> 8, source_id & 0xFF);
+	if (!pdev)
+		return -ENODEV;
+	pr_debug("Notify PCI device fault [%02x:%02x.%d]\n",
+		source_id >> 8, PCI_SLOT(source_id & 0xff),
+		PCI_FUNC(source_id & 0xff));
+	event = kzalloc(sizeof(*event) + sizeof(fault_reason), GFP_KERNEL);
+	if (!event) {
+		ret = -ENOMEM;
+		goto exit_dev_put;
+	}
+
+	event->dev = &pdev->dev;
+	event->buf[0] = fault_reason;
+	event->addr = addr;
+	event->length = sizeof(fault_reason);
+	event->flags = IOMMU_FAULT_UNRECOV;
+	ret = iommu_fault_notifier_call_chain(event);
+
+	kfree(event);
+exit_dev_put:
+	pci_dev_put(pdev);
+
+	return ret;
+}
+
 static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 		u8 fault_reason, u16 source_id, unsigned long long addr)
 {
@@ -1600,11 +1633,14 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 			source_id >> 8, PCI_SLOT(source_id & 0xFF),
 			PCI_FUNC(source_id & 0xFF), addr >> 48,
 			fault_reason, reason);
-	else
+	else {
 		pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n",
 		       type ? "DMA Read" : "DMA Write",
 		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
 		       PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason);
+		dmar_unrecov_fault_notify(fault_reason, source_id, addr);
+	}
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 8/9] iommu/intel-svm: notify page request to guest
  2017-06-27 19:47 ` Jacob Pan
                   ` (7 preceding siblings ...)
  (?)
@ 2017-06-27 19:48 ` Jacob Pan
  -1 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:48 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

If the source device of a page request has its PASID table pointer
bond to a guest, the first level page tables are owned by the guest.
In this case, we shall let guest OS to manage page fault.

This patch uses the IOMMU fault notification API to send notifications,
possibly via VFIO, to the guest OS. Once guest pages are fault in, guest
will issue page response which will be passed down via the invalidation
passdown APIs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/intel-svm.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h     |  1 +
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 23c4276..98fca35 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -525,6 +525,88 @@ static bool access_error(struct vm_area_struct *vma, struct page_req_dsc *req)
 	return (requested & ~vma->vm_flags) != 0;
 }
 
+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+	int prot = 0;
+
+	if (req->rd_req)
+		prot |= IOMMU_READ;
+	if (req->wr_req)
+		prot |= IOMMU_WRITE;
+	if (req->exe_req)
+		prot |= IOMMU_EXEC;
+	if (req->priv_req)
+		prot |= IOMMU_PRIV;
+
+	return prot;
+}
+
+static int intel_svm_prq_notify(struct device *dev, struct page_req_dsc *desc)
+{
+	int ret = 0;
+	struct iommu_fault_event *event;
+	struct pci_dev *pdev;
+	struct device_domain_info *info;
+	unsigned long buf_offset;
+
+	/**
+	 * If caller does not provide struct device, this is the case where
+	 * guest PASID table is bound to the device. So we need to retrieve
+	 * struct device from the page request descriptor then proceed.
+	 */
+	if (!dev) {
+		pdev = pci_get_bus_and_slot(desc->bus, desc->devfn);
+		if (!pdev) {
+			pr_err("No PCI device found for PRQ [%02x:%02x.%d]\n",
+				desc->bus, PCI_SLOT(desc->devfn),
+				PCI_FUNC(desc->devfn));
+			return -ENODEV;
+		}
+		/**
+		 * Make sure PASID table pointer is bound to guest, if yes notify
+		 * handler in the guest, e.g. via VFIO.
+		 */
+		info = pdev->dev.archdata.iommu;
+		if (!info || !info->pasid_tbl_bound) {
+			pr_debug("PRQ device pasid table not bound.\n");
+			ret = -EINVAL;
+			goto exit_put_dev;
+		}
+		dev = &pdev->dev;
+	} else if (dev_is_pci(dev)) {
+		pdev = to_pci_dev(dev);
+		pci_dev_get(pdev);
+	} else
+		return -ENODEV;
+
+	pr_debug("Notify PRQ device [%02x:%02x.%d]\n",
+		desc->bus, PCI_SLOT(desc->devfn),
+		PCI_FUNC(desc->devfn));
+	event = kzalloc(sizeof(*event) + sizeof(*desc), GFP_KERNEL);
+	if (!event) {
+		ret = -ENOMEM;
+		goto exit_put_dev;
+	}
+
+	/* Fill in event data for device specific processing */
+	event->dev = dev;
+	buf_offset = offsetofend(struct iommu_fault_event, length);
+	memcpy(buf_offset + event, desc, sizeof(*desc));
+	event->addr = desc->addr;
+	event->pasid = desc->pasid;
+	event->prot = prq_to_iommu_prot(desc);
+	event->length = sizeof(*desc);
+	event->flags = IOMMU_FAULT_PAGE_REQ;
+
+	ret = iommu_fault_notifier_call_chain(event);
+	kfree(event);
+
+exit_put_dev:
+	pci_dev_put(pdev);
+
+	return ret;
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
 	struct intel_iommu *iommu = d;
@@ -548,7 +630,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		handled = 1;
 
 		req = &iommu->prq[head / sizeof(*req)];
-
+		/**
+		 * If prq is to be handled outside iommu driver via receiver of
+		 * the fault notifiers, we skip the page response here.
+		 */
+		if (!intel_svm_prq_notify(NULL, req))
+			continue;
 		result = QI_RESP_FAILURE;
 		address = (u64)req->addr << VTD_PAGE_SHIFT;
 		if (!req->pasid_present) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ed2f804..d0f28cd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -42,6 +42,7 @@
  * if the IOMMU page table format is equivalent.
  */
 #define IOMMU_PRIV	(1 << 5)
+#define IOMMU_EXEC	(1 << 6)
 
 struct iommu_ops;
 struct iommu_group;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 9/9] iommu/intel-svm: replace dev ops with generic fault notifier
  2017-06-27 19:47 ` Jacob Pan
                   ` (8 preceding siblings ...)
  (?)
@ 2017-06-27 19:48 ` Jacob Pan
  -1 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-27 19:48 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan Tianyu, Tian, Kevin, Raj Ashok, Alex Williamson,
	Jean Delvare, Jacob Pan

Intel SVM devices register callbacks when bind task and PASID. These fault
callbacks are optional which can be replaced by the generic IOMMU fault
notifier APIs.

Currently, there is no main line kernel user of these callback functions.
Fault event data delivered by the IOMMU notification API is both generic
and contains raw data for architectural specific uses.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/iommu/intel-svm.c | 13 ++-----------
 include/linux/intel-svm.h | 20 +++-----------------
 2 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 98fca35..4831c6f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -291,7 +291,7 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 
-int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
+int intel_svm_bind_mm(struct device *dev, int *pasid, int flags)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
 	struct intel_svm_dev *sdev;
@@ -337,10 +337,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 
 			list_for_each_entry(sdev, &svm->devs, list) {
 				if (dev == sdev->dev) {
-					if (sdev->ops != ops) {
-						ret = -EBUSY;
-						goto out;
-					}
 					sdev->users++;
 					goto success;
 				}
@@ -366,7 +362,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 	}
 	/* Finish the setup now we know we're keeping it */
 	sdev->users = 1;
-	sdev->ops = ops;
 	init_rcu_head(&sdev->rcu);
 
 	if (!svm) {
@@ -701,11 +696,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		if (WARN_ON(&sdev->list == &svm->devs))
 			sdev = NULL;
 
-		if (sdev && sdev->ops && sdev->ops->fault_cb) {
-			int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
-				(req->exe_req << 1) | (req->priv_req);
-			sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr, req->private, rwxp, result);
-		}
+		intel_svm_prq_notify(sdev->dev, req);
 		/* We get here in the error case where the PASID lookup failed,
 		   and these can be NULL. Do not use them below this point! */
 		sdev = NULL;
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 3c25794..67e056d3 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -18,18 +18,6 @@
 
 struct device;
 
-struct svm_dev_ops {
-	void (*fault_cb)(struct device *dev, int pasid, u64 address,
-			 u32 private, int rwxp, int response);
-};
-
-/* Values for rxwp in fault_cb callback */
-#define SVM_REQ_READ	(1<<3)
-#define SVM_REQ_WRITE	(1<<2)
-#define SVM_REQ_EXEC	(1<<1)
-#define SVM_REQ_PRIV	(1<<0)
-
-
 /*
  * The SVM_FLAG_PRIVATE_PASID flag requests a PASID which is *not* the "main"
  * PASID for the current process. Even if a PASID already exists, a new one
@@ -60,7 +48,6 @@ struct svm_dev_ops {
  * @dev:	Device to be granted acccess
  * @pasid:	Address for allocated PASID
  * @flags:	Flags. Later for requesting supervisor mode, etc.
- * @ops:	Callbacks to device driver
  *
  * This function attempts to enable PASID support for the given device.
  * If the @pasid argument is non-%NULL, a PASID is allocated for access
@@ -82,8 +69,7 @@ struct svm_dev_ops {
  * Multiple calls from the same process may result in the same PASID
  * being re-used. A reference count is kept.
  */
-extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
-			     struct svm_dev_ops *ops);
+extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags);
 
 /**
  * intel_svm_unbind_mm() - Unbind a specified PASID
@@ -105,7 +91,7 @@ extern int intel_svm_unbind_mm(struct device *dev, int pasid);
 #else /* CONFIG_INTEL_IOMMU_SVM */
 
 static inline int intel_svm_bind_mm(struct device *dev, int *pasid,
-				    int flags, struct svm_dev_ops *ops)
+				int flags)
 {
 	return -ENOSYS;
 }
@@ -116,6 +102,6 @@ static inline int intel_svm_unbind_mm(struct device *dev, int pasid)
 }
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
-#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0, NULL))
+#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0))
 
 #endif /* __INTEL_SVM_H__ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/9] iommu: Introduce bind_pasid_table API function
@ 2017-06-28  9:57     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28  9:57 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, Liu, Yi L

On Tue, Jun 27, 2017 at 12:47:55PM -0700, Jacob Pan wrote:
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;

I think its better to return -ENODEV here, like other iommu-api
functions do when a callback is NULL.

> +enum iommu_model {
> +	IOMMU_MODEL_INTEL_VTD,
> +	IOMMU_MODEL_ARM_SMMU,
> +};

AMD IOMMU also supports shared virtual memory.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/9] iommu: Introduce bind_pasid_table API function
@ 2017-06-28  9:57     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28  9:57 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lan Tianyu, Liu-zLv9SwRftAIdnm+yROfE0A, Tian, Kevin, Yi L, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Tue, Jun 27, 2017 at 12:47:55PM -0700, Jacob Pan wrote:
> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
> +			struct pasid_table_info *pasidt_binfo)
> +{
> +	if (unlikely(!domain->ops->bind_pasid_table))
> +		return -EINVAL;

I think its better to return -ENODEV here, like other iommu-api
functions do when a callback is NULL.

> +enum iommu_model {
> +	IOMMU_MODEL_INTEL_VTD,
> +	IOMMU_MODEL_ARM_SMMU,
> +};

AMD IOMMU also supports shared virtual memory.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-06-28 10:02     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28 10:02 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, Liu, Yi L

On Tue, Jun 27, 2017 at 12:47:56PM -0700, Jacob Pan wrote:
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  drivers/iommu/intel-iommu.c   | 117 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 118 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 8274ce3..ef05b59 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5430,6 +5430,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	struct pci_dev *pdev;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model != IOMMU_MODEL_INTEL_VTD) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;
> +	/*
> +	 * check SID, if it is not correct, return success to allow looping
> +	 * through all devices within a group
> +	 */
> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return -ENODEV;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> +		return -EINVAL;
> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported ||
> +		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
> +		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
> +			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
> +			       PCI_FUNC(devfn));

You can either use dev_name() here to decode the device name, or just
use dev_err() instead.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-06-28 10:02     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28 10:02 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lan Tianyu, Liu-zLv9SwRftAIdnm+yROfE0A, Tian, Kevin, Yi L, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Tue, Jun 27, 2017 at 12:47:56PM -0700, Jacob Pan wrote:
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Ashok Raj <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/iommu/intel-iommu.c   | 117 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 118 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 8274ce3..ef05b59 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5430,6 +5430,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>  
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	struct pci_dev *pdev;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model != IOMMU_MODEL_INTEL_VTD) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;
> +	/*
> +	 * check SID, if it is not correct, return success to allow looping
> +	 * through all devices within a group
> +	 */
> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return -ENODEV;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> +		return -EINVAL;
> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported ||
> +		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
> +		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
> +			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
> +			       PCI_FUNC(devfn));

You can either use dev_name() here to decode the device name, or just
use dev_err() instead.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
  2017-06-27 19:47 ` [PATCH 3/9] iommu: Introduce iommu do invalidate API function Jacob Pan
@ 2017-06-28 10:08   ` Joerg Roedel
  2017-06-28 16:09       ` Jacob Pan
  0 siblings, 1 reply; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28 10:08 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, Liu, Yi L

On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> When a SVM capable device is assigned to a guest, the first level page
> tables are owned by the guest and the guest PASID table pointer is
> linked to the device context entry of the physical IOMMU.
> 
> Host IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation activities are passed down to the host. The
> primary usage is derived from emulated IOMMU in the guest, where QEMU
> can trap invalidation activities before pass them down the
> host/physical IOMMU. There are IOMMU architectural specific actions
> need to be taken which requires the generic APIs introduced in this
> patch to have opaque data in the tlb_invalidate_info argument.

Which "IOMMU architectural specific actions" are you thinking of?

> +int iommu_invalidate(struct iommu_domain *domain,
> +		struct device *dev, struct tlb_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +
> +	if (unlikely(!domain->ops->invalidate))
> +		return -ENODEV;
> +
> +	ret = domain->ops->invalidate(domain, dev, inv_info);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_invalidate);

[...]

> +struct tlb_invalidate_info {
> +	__u32	model;
> +	__u32	length;
> +	__u8	opaque[];
> +};

This interface is aweful. It requires the user of a generic api to know
details about the implementation behind to do anything useful.

Please explain in more detail why this is needed. My feeling is that we
can make this more generic with a small set of invalidation functions in
the iommu-api.



	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] iommu: Introduce fault notifier API
@ 2017-06-28 10:16     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28 10:16 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare

On Tue, Jun 27, 2017 at 12:47:59PM -0700, Jacob Pan wrote:
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d973555..07cfd92 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -48,6 +48,7 @@ struct iommu_group {
>  	struct list_head devices;
>  	struct mutex mutex;
>  	struct blocking_notifier_head notifier;
> +	struct blocking_notifier_head fault_notifier;

Do you really need a notifier chain here? Will there ever be more than
one callback registered to it?

> +struct iommu_fault_event {
> +	struct device *dev;

Putting a 'struct device *' member in a uapi struct looks fundamentally
wrong.



	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] iommu: Introduce fault notifier API
@ 2017-06-28 10:16     ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-06-28 10:16 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lan Tianyu, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Tue, Jun 27, 2017 at 12:47:59PM -0700, Jacob Pan wrote:
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d973555..07cfd92 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -48,6 +48,7 @@ struct iommu_group {
>  	struct list_head devices;
>  	struct mutex mutex;
>  	struct blocking_notifier_head notifier;
> +	struct blocking_notifier_head fault_notifier;

Do you really need a notifier chain here? Will there ever be more than
one callback registered to it?

> +struct iommu_fault_event {
> +	struct device *dev;

Putting a 'struct device *' member in a uapi struct looks fundamentally
wrong.



	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-06-28 16:09       ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-28 16:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, Liu, Yi L,
	jacob.jun.pan

On Wed, 28 Jun 2017 12:08:23 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
> > From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> > 
> > When a SVM capable device is assigned to a guest, the first level
> > page tables are owned by the guest and the guest PASID table
> > pointer is linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates
> > unless the guest invalidation activities are passed down to the
> > host. The primary usage is derived from emulated IOMMU in the
> > guest, where QEMU can trap invalidation activities before pass them
> > down the host/physical IOMMU. There are IOMMU architectural
> > specific actions need to be taken which requires the generic APIs
> > introduced in this patch to have opaque data in the
> > tlb_invalidate_info argument.  
> 
> Which "IOMMU architectural specific actions" are you thinking of?
> 
construction of queued invalidation descriptors, then submit them to
the IOMMU QI interface.
> > +int iommu_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info
> > *inv_info) +{
> > +	int ret = 0;
> > +
> > +	if (unlikely(!domain->ops->invalidate))
> > +		return -ENODEV;
> > +
> > +	ret = domain->ops->invalidate(domain, dev, inv_info);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_invalidate);  
> 
> [...]
> 
> > +struct tlb_invalidate_info {
> > +	__u32	model;
> > +	__u32	length;
> > +	__u8	opaque[];
> > +};  
> 
> This interface is aweful. It requires the user of a generic api to
> know details about the implementation behind to do anything useful.
> 
> Please explain in more detail why this is needed. My feeling is that
> we can make this more generic with a small set of invalidation
> functions in the iommu-api.
> 
My thinking was that via configuration control, there will be unlikely
any mixed IOMMU models between pIOMMU and vIOMMU. We could have just
model specific data pass through layers of SW (QEMU, VFIO) for
performance reasons. We do have an earlier hybrid version that has
generic data and opaque raw data. Would the below work for all IOMMU
models?

https://www.spinics.net/lists/kvm/msg148798.html

struct tlb_invalidate_info
{
        __u32   model;  /* Vendor number */
        __u8 granularity
#define DEVICE_SELECTVIE_INV    (1 << 0)
#define PAGE_SELECTIVE_INV      (1 << 0)
#define PASID_SELECTIVE_INV     (1 << 1)
        __u32 pasid;
        __u64 addr;
        __u64 size;

        /* Since IOMMU format has already been validated for this table,
           the IOMMU driver knows that the following structure is in a
           format it knows */
        __u8 opaque[];
};

> 
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-06-28 16:09       ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-28 16:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lan Tianyu, Liu, Yi L, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Wed, 28 Jun 2017 12:08:23 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
> > From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > When a SVM capable device is assigned to a guest, the first level
> > page tables are owned by the guest and the guest PASID table
> > pointer is linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates
> > unless the guest invalidation activities are passed down to the
> > host. The primary usage is derived from emulated IOMMU in the
> > guest, where QEMU can trap invalidation activities before pass them
> > down the host/physical IOMMU. There are IOMMU architectural
> > specific actions need to be taken which requires the generic APIs
> > introduced in this patch to have opaque data in the
> > tlb_invalidate_info argument.  
> 
> Which "IOMMU architectural specific actions" are you thinking of?
> 
construction of queued invalidation descriptors, then submit them to
the IOMMU QI interface.
> > +int iommu_invalidate(struct iommu_domain *domain,
> > +		struct device *dev, struct tlb_invalidate_info
> > *inv_info) +{
> > +	int ret = 0;
> > +
> > +	if (unlikely(!domain->ops->invalidate))
> > +		return -ENODEV;
> > +
> > +	ret = domain->ops->invalidate(domain, dev, inv_info);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_invalidate);  
> 
> [...]
> 
> > +struct tlb_invalidate_info {
> > +	__u32	model;
> > +	__u32	length;
> > +	__u8	opaque[];
> > +};  
> 
> This interface is aweful. It requires the user of a generic api to
> know details about the implementation behind to do anything useful.
> 
> Please explain in more detail why this is needed. My feeling is that
> we can make this more generic with a small set of invalidation
> functions in the iommu-api.
> 
My thinking was that via configuration control, there will be unlikely
any mixed IOMMU models between pIOMMU and vIOMMU. We could have just
model specific data pass through layers of SW (QEMU, VFIO) for
performance reasons. We do have an earlier hybrid version that has
generic data and opaque raw data. Would the below work for all IOMMU
models?

https://www.spinics.net/lists/kvm/msg148798.html

struct tlb_invalidate_info
{
        __u32   model;  /* Vendor number */
        __u8 granularity
#define DEVICE_SELECTVIE_INV    (1 << 0)
#define PAGE_SELECTIVE_INV      (1 << 0)
#define PASID_SELECTIVE_INV     (1 << 1)
        __u32 pasid;
        __u64 addr;
        __u64 size;

        /* Since IOMMU format has already been validated for this table,
           the IOMMU driver knows that the following structure is in a
           format it knows */
        __u8 opaque[];
};

> 
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] iommu: Introduce fault notifier API
@ 2017-06-28 16:16       ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-28 16:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, jacob.jun.pan

On Wed, 28 Jun 2017 12:16:03 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:59PM -0700, Jacob Pan wrote:
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index d973555..07cfd92 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -48,6 +48,7 @@ struct iommu_group {
> >  	struct list_head devices;
> >  	struct mutex mutex;
> >  	struct blocking_notifier_head notifier;
> > +	struct blocking_notifier_head fault_notifier;  
> 
> Do you really need a notifier chain here? Will there ever be more than
> one callback registered to it?
> 
yes, this notifier chain is shared by all devices under a group. the
event contains device info which notifier callbacks can filter.
> > +struct iommu_fault_event {
> > +	struct device *dev;  
> 
> Putting a 'struct device *' member in a uapi struct looks
> fundamentally wrong.
> 
> 
my mistake, it was originally (RFC) not in uapi but with the
consideration of using vfio to expose it to user space I have moved it
to uapi. But you are right, it should be some other forms of device
representation used by vfio. VFIO layer has to do the translation and
inject that into the guest. In kernel driver users can use struct
device to identify the faulting device.

> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] iommu: Introduce fault notifier API
@ 2017-06-28 16:16       ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-06-28 16:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lan Tianyu, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Wed, 28 Jun 2017 12:16:03 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:59PM -0700, Jacob Pan wrote:
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index d973555..07cfd92 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -48,6 +48,7 @@ struct iommu_group {
> >  	struct list_head devices;
> >  	struct mutex mutex;
> >  	struct blocking_notifier_head notifier;
> > +	struct blocking_notifier_head fault_notifier;  
> 
> Do you really need a notifier chain here? Will there ever be more than
> one callback registered to it?
> 
yes, this notifier chain is shared by all devices under a group. the
event contains device info which notifier callbacks can filter.
> > +struct iommu_fault_event {
> > +	struct device *dev;  
> 
> Putting a 'struct device *' member in a uapi struct looks
> fundamentally wrong.
> 
> 
my mistake, it was originally (RFC) not in uapi but with the
consideration of using vfio to expose it to user space I have moved it
to uapi. But you are right, it should be some other forms of device
representation used by vfio. VFIO layer has to do the translation and
inject that into the guest. In kernel driver users can use struct
device to identify the faulting device.

> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
  2017-06-28 16:09       ` Jacob Pan
@ 2017-06-28 17:07         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 38+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-28 17:07 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: Lan Tianyu, Liu, Yi L, Tian, Kevin, LKML, iommu, Jean Delvare,
	David Woodhouse

On 28/06/17 17:09, Jacob Pan wrote:
> On Wed, 28 Jun 2017 12:08:23 +0200
> Joerg Roedel <joro@8bytes.org> wrote:
> 
>> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
>>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
>>>
>>> When a SVM capable device is assigned to a guest, the first level
>>> page tables are owned by the guest and the guest PASID table
>>> pointer is linked to the device context entry of the physical IOMMU.
>>>
>>> Host IOMMU driver has no knowledge of caching structure updates
>>> unless the guest invalidation activities are passed down to the
>>> host. The primary usage is derived from emulated IOMMU in the
>>> guest, where QEMU can trap invalidation activities before pass them
>>> down the host/physical IOMMU. There are IOMMU architectural
>>> specific actions need to be taken which requires the generic APIs
>>> introduced in this patch to have opaque data in the
>>> tlb_invalidate_info argument.  
>>
>> Which "IOMMU architectural specific actions" are you thinking of?
>>
> construction of queued invalidation descriptors, then submit them to
> the IOMMU QI interface.
>>> +int iommu_invalidate(struct iommu_domain *domain,
>>> +		struct device *dev, struct tlb_invalidate_info
>>> *inv_info) +{
>>> +	int ret = 0;
>>> +
>>> +	if (unlikely(!domain->ops->invalidate))
>>> +		return -ENODEV;
>>> +
>>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_invalidate);  
>>
>> [...]
>>
>>> +struct tlb_invalidate_info {
>>> +	__u32	model;
>>> +	__u32	length;
>>> +	__u8	opaque[];
>>> +};  
>>
>> This interface is aweful. It requires the user of a generic api to
>> know details about the implementation behind to do anything useful.
>>
>> Please explain in more detail why this is needed. My feeling is that
>> we can make this more generic with a small set of invalidation
>> functions in the iommu-api.
>>
> My thinking was that via configuration control, there will be unlikely
> any mixed IOMMU models between pIOMMU and vIOMMU. We could have just
> model specific data pass through layers of SW (QEMU, VFIO) for
> performance reasons. We do have an earlier hybrid version that has
> generic data and opaque raw data. Would the below work for all IOMMU
> models?

For reference, this was also discussed in the initial posting of the series:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html

At least for ARM SMMUv2 and v3, I think the invalidation format you
propose should be sufficient, although "device_selective" should probably
be "domain_selective". And maybe a flag field could contain relatively
generic hints such as "only invalidate leaf table when page_selective".

Thanks,
Jean

> https://www.spinics.net/lists/kvm/msg148798.html
> 
> struct tlb_invalidate_info
> {
>         __u32   model;  /* Vendor number */
>         __u8 granularity
> #define DEVICE_SELECTVIE_INV    (1 << 0)
> #define PAGE_SELECTIVE_INV      (1 << 0)
> #define PASID_SELECTIVE_INV     (1 << 1)
>         __u32 pasid;
>         __u64 addr;
>         __u64 size;
> 
>         /* Since IOMMU format has already been validated for this table,
>            the IOMMU driver knows that the following structure is in a
>            format it knows */
>         __u8 opaque[];
> };
> 
>>
>>
>> 	Joerg
>>
> 
> [Jacob Pan]
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-06-28 17:07         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 38+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-28 17:07 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: Lan Tianyu, Liu, Yi L, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On 28/06/17 17:09, Jacob Pan wrote:
> On Wed, 28 Jun 2017 12:08:23 +0200
> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> 
>> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
>>> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>>
>>> When a SVM capable device is assigned to a guest, the first level
>>> page tables are owned by the guest and the guest PASID table
>>> pointer is linked to the device context entry of the physical IOMMU.
>>>
>>> Host IOMMU driver has no knowledge of caching structure updates
>>> unless the guest invalidation activities are passed down to the
>>> host. The primary usage is derived from emulated IOMMU in the
>>> guest, where QEMU can trap invalidation activities before pass them
>>> down the host/physical IOMMU. There are IOMMU architectural
>>> specific actions need to be taken which requires the generic APIs
>>> introduced in this patch to have opaque data in the
>>> tlb_invalidate_info argument.  
>>
>> Which "IOMMU architectural specific actions" are you thinking of?
>>
> construction of queued invalidation descriptors, then submit them to
> the IOMMU QI interface.
>>> +int iommu_invalidate(struct iommu_domain *domain,
>>> +		struct device *dev, struct tlb_invalidate_info
>>> *inv_info) +{
>>> +	int ret = 0;
>>> +
>>> +	if (unlikely(!domain->ops->invalidate))
>>> +		return -ENODEV;
>>> +
>>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_invalidate);  
>>
>> [...]
>>
>>> +struct tlb_invalidate_info {
>>> +	__u32	model;
>>> +	__u32	length;
>>> +	__u8	opaque[];
>>> +};  
>>
>> This interface is aweful. It requires the user of a generic api to
>> know details about the implementation behind to do anything useful.
>>
>> Please explain in more detail why this is needed. My feeling is that
>> we can make this more generic with a small set of invalidation
>> functions in the iommu-api.
>>
> My thinking was that via configuration control, there will be unlikely
> any mixed IOMMU models between pIOMMU and vIOMMU. We could have just
> model specific data pass through layers of SW (QEMU, VFIO) for
> performance reasons. We do have an earlier hybrid version that has
> generic data and opaque raw data. Would the below work for all IOMMU
> models?

For reference, this was also discussed in the initial posting of the series:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html

At least for ARM SMMUv2 and v3, I think the invalidation format you
propose should be sufficient, although "device_selective" should probably
be "domain_selective". And maybe a flag field could contain relatively
generic hints such as "only invalidate leaf table when page_selective".

Thanks,
Jean

> https://www.spinics.net/lists/kvm/msg148798.html
> 
> struct tlb_invalidate_info
> {
>         __u32   model;  /* Vendor number */
>         __u8 granularity
> #define DEVICE_SELECTVIE_INV    (1 << 0)
> #define PAGE_SELECTIVE_INV      (1 << 0)
> #define PASID_SELECTIVE_INV     (1 << 1)
>         __u32 pasid;
>         __u64 addr;
>         __u64 size;
> 
>         /* Since IOMMU format has already been validated for this table,
>            the IOMMU driver knows that the following structure is in a
>            format it knows */
>         __u8 opaque[];
> };
> 
>>
>>
>> 	Joerg
>>
> 
> [Jacob Pan]
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-07-05  7:38     ` Tian, Kevin
  0 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:38 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Liu, Yi L, Lan, Tianyu, Raj, Ashok, Alex Williamson, Jean Delvare

> From: Jacob Pan [mailto:jacob.jun.pan@linux.intel.com]
> Sent: Wednesday, June 28, 2017 3:48 AM
> 
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  drivers/iommu/intel-iommu.c   | 117
> ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 118 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 8274ce3..ef05b59 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5430,6 +5430,119 @@ struct intel_iommu
> *intel_svm_device_to_iommu(struct device *dev)
> 
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	struct pci_dev *pdev;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model !=
> IOMMU_MODEL_INTEL_VTD) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;
> +	/*
> +	 * check SID, if it is not correct, return success to allow looping
> +	 * through all devices within a group
> +	 */

Can you elaborate this comment since it's related to caller behavior?
Also better move to earlier as the comment for whole function...

> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return -ENODEV;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> +		return -EINVAL;
> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported ||
> +		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
> +		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
> +			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
> +			       PCI_FUNC(devfn));
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (pasidt_binfo->size > intel_iommu_get_pts(iommu)) {
> +		pr_err("Invalid gPASID table size %llu, host size %lu\n",
> +			pasidt_binfo->size,
> +			intel_iommu_get_pts(iommu));
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	spin_lock_irqsave(&iommu->lock, flags);
> +	context = iommu_context_addr(iommu, bus, devfn, 0);
> +	if (!context || !context_present(context)) {
> +		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
> +			__func__, bus, devfn);
> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		goto out;
> +	}
> +
> +	/* Anticipate guest to use SVM and owns the first level */
> +	ctx_lo = context[0].lo;
> +	ctx_lo |= CONTEXT_NESTE;
> +	ctx_lo |= CONTEXT_PRS;
> +	ctx_lo |= CONTEXT_PASIDE;
> +	ctx_lo &= ~CONTEXT_TT_MASK;
> +	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
> +	context[0].lo = ctx_lo;
> +
> +	/* Assign guest PASID table pointer and size */
> +	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
> +	context[1].lo = ctx_lo;
> +	/* make sure context entry is updated before flushing */
> +	wmb();
> +	did = dmar_domain->iommu_did[iommu->seq_id];
> +	iommu->flush.flush_context(iommu, did,
> +				(((u16)bus) << 8) | devfn,
> +				DMA_CCMD_MASK_NOBIT,
> +				DMA_CCMD_DEVICE_INVL);
> +	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> +	spin_unlock_irqrestore(&iommu->lock, flags);
> +
> +
> +out:
> +	return ret;
> +}
> +
> +static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> +					struct device *dev)
> +{
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	u8 bus, devfn;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +	/*
> +	 * REVISIT: we might want to clear the PASID table pointer
> +	 * as part of context clear operation. Currently, it leaves
> +	 * stale data but should be ignored by hardware since PASIDE
> +	 * is clear.
> +	 */
> +	/* ATS will be reenabled when remapping is restored */
> +	pci_disable_ats(to_pci_dev(dev));
> +	domain_context_clear(iommu, dev);
> +	return domain_context_mapping_one(dmar_domain, iommu, bus,
> devfn);
> +}
>  #endif /* CONFIG_INTEL_IOMMU_SVM */
> 
>  const struct iommu_ops intel_iommu_ops = {
> @@ -5438,6 +5551,10 @@ const struct iommu_ops intel_iommu_ops = {
>  	.domain_free		= intel_iommu_domain_free,
>  	.attach_dev		= intel_iommu_attach_device,
>  	.detach_dev		= intel_iommu_detach_device,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> +	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +#endif
>  	.map			= intel_iommu_map,
>  	.unmap			= intel_iommu_unmap,
>  	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/dma_remapping.h
> b/include/linux/dma_remapping.h
> index 9088407..85367b7 100644
> --- a/include/linux/dma_remapping.h
> +++ b/include/linux/dma_remapping.h
> @@ -27,6 +27,7 @@
> 
>  #define CONTEXT_DINVE		(1ULL << 8)
>  #define CONTEXT_PRS		(1ULL << 9)
> +#define CONTEXT_NESTE		(1ULL << 10)
>  #define CONTEXT_PASIDE		(1ULL << 11)
> 
>  struct intel_iommu;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 2/9] iommu/vt-d: add bind_pasid_table function
@ 2017-07-05  7:38     ` Tian, Kevin
  0 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:38 UTC (permalink / raw)
  To: Jacob Pan, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	LKML, Joerg Roedel, David Woodhouse
  Cc: Lan, Tianyu, Jean Delvare

> From: Jacob Pan [mailto:jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org]
> Sent: Wednesday, June 28, 2017 3:48 AM
> 
> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Signed-off-by: Ashok Raj <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/iommu/intel-iommu.c   | 117
> ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 118 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 8274ce3..ef05b59 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5430,6 +5430,119 @@ struct intel_iommu
> *intel_svm_device_to_iommu(struct device *dev)
> 
>  	return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> +		struct device *dev, struct pasid_table_info *pasidt_binfo)
> +{
> +	struct intel_iommu *iommu;
> +	struct context_entry *context;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct device_domain_info *info;
> +	struct pci_dev *pdev;
> +	u8 bus, devfn;
> +	u16 did, *sid;
> +	int ret = 0;
> +	unsigned long flags;
> +	u64 ctx_lo;
> +
> +	if (pasidt_binfo == NULL || pasidt_binfo->model !=
> IOMMU_MODEL_INTEL_VTD) {
> +		pr_warn("%s: Invalid bind request!\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	sid = (u16 *)&pasidt_binfo->opaque;
> +	/*
> +	 * check SID, if it is not correct, return success to allow looping
> +	 * through all devices within a group
> +	 */

Can you elaborate this comment since it's related to caller behavior?
Also better move to earlier as the comment for whole function...

> +	if (PCI_DEVID(bus, devfn) != *sid)
> +		return 0;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return -ENODEV;
> +
> +	pdev = to_pci_dev(dev);
> +	if (!pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> +		return -EINVAL;
> +
> +	info = dev->archdata.iommu;
> +	if (!info || !info->pasid_supported ||
> +		!pci_enable_pasid(pdev, info->pasid_supported & ~1)) {
> +		pr_err("PCI %04x:%02x:%02x.%d: has no PASID support\n",
> +			       pci_domain_nr(pdev->bus), bus, PCI_SLOT(devfn),
> +			       PCI_FUNC(devfn));
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (pasidt_binfo->size > intel_iommu_get_pts(iommu)) {
> +		pr_err("Invalid gPASID table size %llu, host size %lu\n",
> +			pasidt_binfo->size,
> +			intel_iommu_get_pts(iommu));
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	spin_lock_irqsave(&iommu->lock, flags);
> +	context = iommu_context_addr(iommu, bus, devfn, 0);
> +	if (!context || !context_present(context)) {
> +		pr_warn("%s: ctx not present for bus devfn %x:%x\n",
> +			__func__, bus, devfn);
> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		goto out;
> +	}
> +
> +	/* Anticipate guest to use SVM and owns the first level */
> +	ctx_lo = context[0].lo;
> +	ctx_lo |= CONTEXT_NESTE;
> +	ctx_lo |= CONTEXT_PRS;
> +	ctx_lo |= CONTEXT_PASIDE;
> +	ctx_lo &= ~CONTEXT_TT_MASK;
> +	ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
> +	context[0].lo = ctx_lo;
> +
> +	/* Assign guest PASID table pointer and size */
> +	ctx_lo = (pasidt_binfo->ptr & VTD_PAGE_MASK) | pasidt_binfo->size;
> +	context[1].lo = ctx_lo;
> +	/* make sure context entry is updated before flushing */
> +	wmb();
> +	did = dmar_domain->iommu_did[iommu->seq_id];
> +	iommu->flush.flush_context(iommu, did,
> +				(((u16)bus) << 8) | devfn,
> +				DMA_CCMD_MASK_NOBIT,
> +				DMA_CCMD_DEVICE_INVL);
> +	iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> +	spin_unlock_irqrestore(&iommu->lock, flags);
> +
> +
> +out:
> +	return ret;
> +}
> +
> +static int intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
> +					struct device *dev)
> +{
> +	struct intel_iommu *iommu;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	u8 bus, devfn;
> +
> +	iommu = device_to_iommu(dev, &bus, &devfn);
> +	if (!iommu)
> +		return -ENODEV;
> +	/*
> +	 * REVISIT: we might want to clear the PASID table pointer
> +	 * as part of context clear operation. Currently, it leaves
> +	 * stale data but should be ignored by hardware since PASIDE
> +	 * is clear.
> +	 */
> +	/* ATS will be reenabled when remapping is restored */
> +	pci_disable_ats(to_pci_dev(dev));
> +	domain_context_clear(iommu, dev);
> +	return domain_context_mapping_one(dmar_domain, iommu, bus,
> devfn);
> +}
>  #endif /* CONFIG_INTEL_IOMMU_SVM */
> 
>  const struct iommu_ops intel_iommu_ops = {
> @@ -5438,6 +5551,10 @@ const struct iommu_ops intel_iommu_ops = {
>  	.domain_free		= intel_iommu_domain_free,
>  	.attach_dev		= intel_iommu_attach_device,
>  	.detach_dev		= intel_iommu_detach_device,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.bind_pasid_table	= intel_iommu_bind_pasid_table,
> +	.unbind_pasid_table	= intel_iommu_unbind_pasid_table,
> +#endif
>  	.map			= intel_iommu_map,
>  	.unmap			= intel_iommu_unmap,
>  	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/dma_remapping.h
> b/include/linux/dma_remapping.h
> index 9088407..85367b7 100644
> --- a/include/linux/dma_remapping.h
> +++ b/include/linux/dma_remapping.h
> @@ -27,6 +27,7 @@
> 
>  #define CONTEXT_DINVE		(1ULL << 8)
>  #define CONTEXT_PRS		(1ULL << 9)
> +#define CONTEXT_NESTE		(1ULL << 10)
>  #define CONTEXT_PASIDE		(1ULL << 11)
> 
>  struct intel_iommu;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-05  7:57           ` Tian, Kevin
  0 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:57 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan, Joerg Roedel
  Cc: Lan, Tianyu, Liu, Yi L, LKML, iommu, Jean Delvare, David Woodhouse

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, June 29, 2017 1:08 AM
> 
> On 28/06/17 17:09, Jacob Pan wrote:
> > On Wed, 28 Jun 2017 12:08:23 +0200
> > Joerg Roedel <joro@8bytes.org> wrote:
> >
> >> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
> >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> >>>
> >>> When a SVM capable device is assigned to a guest, the first level
> >>> page tables are owned by the guest and the guest PASID table
> >>> pointer is linked to the device context entry of the physical IOMMU.
> >>>
> >>> Host IOMMU driver has no knowledge of caching structure updates
> >>> unless the guest invalidation activities are passed down to the
> >>> host. The primary usage is derived from emulated IOMMU in the
> >>> guest, where QEMU can trap invalidation activities before pass them
> >>> down the host/physical IOMMU. There are IOMMU architectural
> >>> specific actions need to be taken which requires the generic APIs
> >>> introduced in this patch to have opaque data in the
> >>> tlb_invalidate_info argument.
> >>
> >> Which "IOMMU architectural specific actions" are you thinking of?
> >>
> > construction of queued invalidation descriptors, then submit them to
> > the IOMMU QI interface.
> >>> +int iommu_invalidate(struct iommu_domain *domain,
> >>> +		struct device *dev, struct tlb_invalidate_info
> >>> *inv_info) +{
> >>> +	int ret = 0;
> >>> +
> >>> +	if (unlikely(!domain->ops->invalidate))
> >>> +		return -ENODEV;
> >>> +
> >>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_invalidate);
> >>
> >> [...]
> >>
> >>> +struct tlb_invalidate_info {
> >>> +	__u32	model;
> >>> +	__u32	length;
> >>> +	__u8	opaque[];
> >>> +};
> >>
> >> This interface is aweful. It requires the user of a generic api to
> >> know details about the implementation behind to do anything useful.
> >>
> >> Please explain in more detail why this is needed. My feeling is that
> >> we can make this more generic with a small set of invalidation
> >> functions in the iommu-api.

A curious question here. Joreg, which part based on below information
could be generalized in your mind? Previously I also preferred to defining
a common structure. However later I realized there is little code logic
which can be further abstracted to use that structure, since the main
task here is just to construct vendor specific invalidation descriptor upon 
the request...

> >>
> > My thinking was that via configuration control, there will be unlikely
> > any mixed IOMMU models between pIOMMU and vIOMMU. We could
> have just
> > model specific data pass through layers of SW (QEMU, VFIO) for
> > performance reasons. We do have an earlier hybrid version that has
> > generic data and opaque raw data. Would the below work for all IOMMU
> > models?
> 
> For reference, this was also discussed in the initial posting of the series:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html
> 
> At least for ARM SMMUv2 and v3, I think the invalidation format you
> propose should be sufficient, although "device_selective" should probably
> be "domain_selective". And maybe a flag field could contain relatively
> generic hints such as "only invalidate leaf table when page_selective".
> 
> Thanks,
> Jean
> 
> > https://www.spinics.net/lists/kvm/msg148798.html
> >
> > struct tlb_invalidate_info
> > {
> >         __u32   model;  /* Vendor number */
> >         __u8 granularity
> > #define DEVICE_SELECTVIE_INV    (1 << 0)
> > #define PAGE_SELECTIVE_INV      (1 << 0)
> > #define PASID_SELECTIVE_INV     (1 << 1)
> >         __u32 pasid;
> >         __u64 addr;
> >         __u64 size;
> >
> >         /* Since IOMMU format has already been validated for this table,
> >            the IOMMU driver knows that the following structure is in a
> >            format it knows */
> >         __u8 opaque[];
> > };
> >

I just gave some information in another thread:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html

Below summarizes all the invalidation capabilities supported by Intel VTd:

Scope: All PASIDs, single PASID
for each PASID:
        all mappings, or page-selective mappings (addr, size)
invalidation target:
        IOTLB entries (leaf)
        paging structure cache (non-leaf)
        PASID cache (pasid->cr3)
invalidation hint:
        whether global pages are included
        drain reads/writes

(Jean, you may add ARM specific capabilities here)

If we want to define a common structure, go with defining a superset 
of all possible capabilities from all vendors (no opaque then) or only 
including a subset used by some common IOMMU abstraction?
The latter depends on what exactly need to be generalized which needs
to be solved first, otherwise it's difficult to judge why proposed format
is necessary and enough...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-05  7:57           ` Tian, Kevin
  0 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:57 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan, Joerg Roedel
  Cc: Lan, Tianyu, Liu, Yi L, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, June 29, 2017 1:08 AM
> 
> On 28/06/17 17:09, Jacob Pan wrote:
> > On Wed, 28 Jun 2017 12:08:23 +0200
> > Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> >
> >> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
> >>> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>>
> >>> When a SVM capable device is assigned to a guest, the first level
> >>> page tables are owned by the guest and the guest PASID table
> >>> pointer is linked to the device context entry of the physical IOMMU.
> >>>
> >>> Host IOMMU driver has no knowledge of caching structure updates
> >>> unless the guest invalidation activities are passed down to the
> >>> host. The primary usage is derived from emulated IOMMU in the
> >>> guest, where QEMU can trap invalidation activities before pass them
> >>> down the host/physical IOMMU. There are IOMMU architectural
> >>> specific actions need to be taken which requires the generic APIs
> >>> introduced in this patch to have opaque data in the
> >>> tlb_invalidate_info argument.
> >>
> >> Which "IOMMU architectural specific actions" are you thinking of?
> >>
> > construction of queued invalidation descriptors, then submit them to
> > the IOMMU QI interface.
> >>> +int iommu_invalidate(struct iommu_domain *domain,
> >>> +		struct device *dev, struct tlb_invalidate_info
> >>> *inv_info) +{
> >>> +	int ret = 0;
> >>> +
> >>> +	if (unlikely(!domain->ops->invalidate))
> >>> +		return -ENODEV;
> >>> +
> >>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(iommu_invalidate);
> >>
> >> [...]
> >>
> >>> +struct tlb_invalidate_info {
> >>> +	__u32	model;
> >>> +	__u32	length;
> >>> +	__u8	opaque[];
> >>> +};
> >>
> >> This interface is aweful. It requires the user of a generic api to
> >> know details about the implementation behind to do anything useful.
> >>
> >> Please explain in more detail why this is needed. My feeling is that
> >> we can make this more generic with a small set of invalidation
> >> functions in the iommu-api.

A curious question here. Joreg, which part based on below information
could be generalized in your mind? Previously I also preferred to defining
a common structure. However later I realized there is little code logic
which can be further abstracted to use that structure, since the main
task here is just to construct vendor specific invalidation descriptor upon 
the request...

> >>
> > My thinking was that via configuration control, there will be unlikely
> > any mixed IOMMU models between pIOMMU and vIOMMU. We could
> have just
> > model specific data pass through layers of SW (QEMU, VFIO) for
> > performance reasons. We do have an earlier hybrid version that has
> > generic data and opaque raw data. Would the below work for all IOMMU
> > models?
> 
> For reference, this was also discussed in the initial posting of the series:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html
> 
> At least for ARM SMMUv2 and v3, I think the invalidation format you
> propose should be sufficient, although "device_selective" should probably
> be "domain_selective". And maybe a flag field could contain relatively
> generic hints such as "only invalidate leaf table when page_selective".
> 
> Thanks,
> Jean
> 
> > https://www.spinics.net/lists/kvm/msg148798.html
> >
> > struct tlb_invalidate_info
> > {
> >         __u32   model;  /* Vendor number */
> >         __u8 granularity
> > #define DEVICE_SELECTVIE_INV    (1 << 0)
> > #define PAGE_SELECTIVE_INV      (1 << 0)
> > #define PASID_SELECTIVE_INV     (1 << 1)
> >         __u32 pasid;
> >         __u64 addr;
> >         __u64 size;
> >
> >         /* Since IOMMU format has already been validated for this table,
> >            the IOMMU driver knows that the following structure is in a
> >            format it knows */
> >         __u8 opaque[];
> > };
> >

I just gave some information in another thread:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html

Below summarizes all the invalidation capabilities supported by Intel VTd:

Scope: All PASIDs, single PASID
for each PASID:
        all mappings, or page-selective mappings (addr, size)
invalidation target:
        IOTLB entries (leaf)
        paging structure cache (non-leaf)
        PASID cache (pasid->cr3)
invalidation hint:
        whether global pages are included
        drain reads/writes

(Jean, you may add ARM specific capabilities here)

If we want to define a common structure, go with defining a superset 
of all possible capabilities from all vendors (no opaque then) or only 
including a subset used by some common IOMMU abstraction?
The latter depends on what exactly need to be generalized which needs
to be solved first, otherwise it's difficult to judge why proposed format
is necessary and enough...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-05 12:42             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 38+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:42 UTC (permalink / raw)
  To: Tian, Kevin, Jacob Pan, Joerg Roedel
  Cc: Lan, Tianyu, Liu, Yi L, LKML, iommu, Jean Delvare, David Woodhouse

On 05/07/17 08:57, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Thursday, June 29, 2017 1:08 AM
>>
>> On 28/06/17 17:09, Jacob Pan wrote:
>>> On Wed, 28 Jun 2017 12:08:23 +0200
>>> Joerg Roedel <joro@8bytes.org> wrote:
>>>
>>>> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
>>>>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
>>>>>
>>>>> When a SVM capable device is assigned to a guest, the first level
>>>>> page tables are owned by the guest and the guest PASID table
>>>>> pointer is linked to the device context entry of the physical IOMMU.
>>>>>
>>>>> Host IOMMU driver has no knowledge of caching structure updates
>>>>> unless the guest invalidation activities are passed down to the
>>>>> host. The primary usage is derived from emulated IOMMU in the
>>>>> guest, where QEMU can trap invalidation activities before pass them
>>>>> down the host/physical IOMMU. There are IOMMU architectural
>>>>> specific actions need to be taken which requires the generic APIs
>>>>> introduced in this patch to have opaque data in the
>>>>> tlb_invalidate_info argument.
>>>>
>>>> Which "IOMMU architectural specific actions" are you thinking of?
>>>>
>>> construction of queued invalidation descriptors, then submit them to
>>> the IOMMU QI interface.
>>>>> +int iommu_invalidate(struct iommu_domain *domain,
>>>>> +		struct device *dev, struct tlb_invalidate_info
>>>>> *inv_info) +{
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	if (unlikely(!domain->ops->invalidate))
>>>>> +		return -ENODEV;
>>>>> +
>>>>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(iommu_invalidate);
>>>>
>>>> [...]
>>>>
>>>>> +struct tlb_invalidate_info {
>>>>> +	__u32	model;
>>>>> +	__u32	length;
>>>>> +	__u8	opaque[];
>>>>> +};
>>>>
>>>> This interface is aweful. It requires the user of a generic api to
>>>> know details about the implementation behind to do anything useful.
>>>>
>>>> Please explain in more detail why this is needed. My feeling is that
>>>> we can make this more generic with a small set of invalidation
>>>> functions in the iommu-api.
> 
> A curious question here. Joreg, which part based on below information
> could be generalized in your mind? Previously I also preferred to defining
> a common structure. However later I realized there is little code logic
> which can be further abstracted to use that structure, since the main
> task here is just to construct vendor specific invalidation descriptor upon 
> the request...
> 
>>>>
>>> My thinking was that via configuration control, there will be unlikely
>>> any mixed IOMMU models between pIOMMU and vIOMMU. We could
>> have just
>>> model specific data pass through layers of SW (QEMU, VFIO) for
>>> performance reasons. We do have an earlier hybrid version that has
>>> generic data and opaque raw data. Would the below work for all IOMMU
>>> models?
>>
>> For reference, this was also discussed in the initial posting of the series:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html
>>
>> At least for ARM SMMUv2 and v3, I think the invalidation format you
>> propose should be sufficient, although "device_selective" should probably
>> be "domain_selective". And maybe a flag field could contain relatively
>> generic hints such as "only invalidate leaf table when page_selective".
>>
>> Thanks,
>> Jean
>>
>>> https://www.spinics.net/lists/kvm/msg148798.html
>>>
>>> struct tlb_invalidate_info
>>> {
>>>         __u32   model;  /* Vendor number */
>>>         __u8 granularity
>>> #define DEVICE_SELECTVIE_INV    (1 << 0)
>>> #define PAGE_SELECTIVE_INV      (1 << 0)
>>> #define PASID_SELECTIVE_INV     (1 << 1)
>>>         __u32 pasid;
>>>         __u64 addr;
>>>         __u64 size;
>>>
>>>         /* Since IOMMU format has already been validated for this table,
>>>            the IOMMU driver knows that the following structure is in a
>>>            format it knows */
>>>         __u8 opaque[];
>>> };
>>>
> 
> I just gave some information in another thread:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html
> 
> Below summarizes all the invalidation capabilities supported by Intel VTd:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
>         all mappings, or page-selective mappings (addr, size)
> invalidation target:
>         IOTLB entries (leaf)
>         paging structure cache (non-leaf)
>         PASID cache (pasid->cr3)

> invalidation hint:
>         whether global pages are included
>         drain reads/writes
> 
> (Jean, you may add ARM specific capabilities here)

None so far, we don't have hints except for 'leaf', but future revisions
of the architecture are likely to add fields. And some implementations
might want the guest to specify an ASID instead of/in addition to the
PASID. (see my reply in the linked thread)

Thanks,
Jean

> If we want to define a common structure, go with defining a superset 
> of all possible capabilities from all vendors (no opaque then) or only 
> including a subset used by some common IOMMU abstraction?
> The latter depends on what exactly need to be generalized which needs
> to be solved first, otherwise it's difficult to judge why proposed format
> is necessary and enough...
> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-05 12:42             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 38+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:42 UTC (permalink / raw)
  To: Tian, Kevin, Jacob Pan, Joerg Roedel
  Cc: Lan, Tianyu, Liu, Yi L, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On 05/07/17 08:57, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
>> Sent: Thursday, June 29, 2017 1:08 AM
>>
>> On 28/06/17 17:09, Jacob Pan wrote:
>>> On Wed, 28 Jun 2017 12:08:23 +0200
>>> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>>> On Tue, Jun 27, 2017 at 12:47:57PM -0700, Jacob Pan wrote:
>>>>> From: "Liu, Yi L" <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>>>>>
>>>>> When a SVM capable device is assigned to a guest, the first level
>>>>> page tables are owned by the guest and the guest PASID table
>>>>> pointer is linked to the device context entry of the physical IOMMU.
>>>>>
>>>>> Host IOMMU driver has no knowledge of caching structure updates
>>>>> unless the guest invalidation activities are passed down to the
>>>>> host. The primary usage is derived from emulated IOMMU in the
>>>>> guest, where QEMU can trap invalidation activities before pass them
>>>>> down the host/physical IOMMU. There are IOMMU architectural
>>>>> specific actions need to be taken which requires the generic APIs
>>>>> introduced in this patch to have opaque data in the
>>>>> tlb_invalidate_info argument.
>>>>
>>>> Which "IOMMU architectural specific actions" are you thinking of?
>>>>
>>> construction of queued invalidation descriptors, then submit them to
>>> the IOMMU QI interface.
>>>>> +int iommu_invalidate(struct iommu_domain *domain,
>>>>> +		struct device *dev, struct tlb_invalidate_info
>>>>> *inv_info) +{
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	if (unlikely(!domain->ops->invalidate))
>>>>> +		return -ENODEV;
>>>>> +
>>>>> +	ret = domain->ops->invalidate(domain, dev, inv_info);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(iommu_invalidate);
>>>>
>>>> [...]
>>>>
>>>>> +struct tlb_invalidate_info {
>>>>> +	__u32	model;
>>>>> +	__u32	length;
>>>>> +	__u8	opaque[];
>>>>> +};
>>>>
>>>> This interface is aweful. It requires the user of a generic api to
>>>> know details about the implementation behind to do anything useful.
>>>>
>>>> Please explain in more detail why this is needed. My feeling is that
>>>> we can make this more generic with a small set of invalidation
>>>> functions in the iommu-api.
> 
> A curious question here. Joreg, which part based on below information
> could be generalized in your mind? Previously I also preferred to defining
> a common structure. However later I realized there is little code logic
> which can be further abstracted to use that structure, since the main
> task here is just to construct vendor specific invalidation descriptor upon 
> the request...
> 
>>>>
>>> My thinking was that via configuration control, there will be unlikely
>>> any mixed IOMMU models between pIOMMU and vIOMMU. We could
>> have just
>>> model specific data pass through layers of SW (QEMU, VFIO) for
>>> performance reasons. We do have an earlier hybrid version that has
>>> generic data and opaque raw data. Would the below work for all IOMMU
>>> models?
>>
>> For reference, this was also discussed in the initial posting of the series:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03452.html
>>
>> At least for ARM SMMUv2 and v3, I think the invalidation format you
>> propose should be sufficient, although "device_selective" should probably
>> be "domain_selective". And maybe a flag field could contain relatively
>> generic hints such as "only invalidate leaf table when page_selective".
>>
>> Thanks,
>> Jean
>>
>>> https://www.spinics.net/lists/kvm/msg148798.html
>>>
>>> struct tlb_invalidate_info
>>> {
>>>         __u32   model;  /* Vendor number */
>>>         __u8 granularity
>>> #define DEVICE_SELECTVIE_INV    (1 << 0)
>>> #define PAGE_SELECTIVE_INV      (1 << 0)
>>> #define PASID_SELECTIVE_INV     (1 << 1)
>>>         __u32 pasid;
>>>         __u64 addr;
>>>         __u64 size;
>>>
>>>         /* Since IOMMU format has already been validated for this table,
>>>            the IOMMU driver knows that the following structure is in a
>>>            format it knows */
>>>         __u8 opaque[];
>>> };
>>>
> 
> I just gave some information in another thread:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html
> 
> Below summarizes all the invalidation capabilities supported by Intel VTd:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
>         all mappings, or page-selective mappings (addr, size)
> invalidation target:
>         IOTLB entries (leaf)
>         paging structure cache (non-leaf)
>         PASID cache (pasid->cr3)

> invalidation hint:
>         whether global pages are included
>         drain reads/writes
> 
> (Jean, you may add ARM specific capabilities here)

None so far, we don't have hints except for 'leaf', but future revisions
of the architecture are likely to add fields. And some implementations
might want the guest to specify an ASID instead of/in addition to the
PASID. (see my reply in the linked thread)

Thanks,
Jean

> If we want to define a common structure, go with defining a superset 
> of all possible capabilities from all vendors (no opaque then) or only 
> including a subset used by some common IOMMU abstraction?
> The latter depends on what exactly need to be generalized which needs
> to be solved first, otherwise it's difficult to judge why proposed format
> is necessary and enough...
> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-26  9:02             ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-07-26  9:02 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jean-Philippe Brucker, Jacob Pan, Lan, Tianyu, Liu, Yi L, LKML,
	iommu, Jean Delvare, David Woodhouse

On Wed, Jul 05, 2017 at 07:57:57AM +0000, Tian, Kevin wrote:
> > > struct tlb_invalidate_info
> > > {
> > >         __u32   model;  /* Vendor number */

I don't like to have a model-specifier here, as I don't think its
needed.

> > >         __u8 granularity
> > > #define DEVICE_SELECTVIE_INV    (1 << 0)
> > > #define PAGE_SELECTIVE_INV      (1 << 0)
> > > #define PASID_SELECTIVE_INV     (1 << 1)
> > >         __u32 pasid;
> > >         __u64 addr;
> > >         __u64 size;
> > >
> > >         /* Since IOMMU format has already been validated for this table,
> > >            the IOMMU driver knows that the following structure is in a
> > >            format it knows */
> > >         __u8 opaque[];
> > > };
> > >
> 
> I just gave some information in another thread:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html
> 
> Below summarizes all the invalidation capabilities supported by Intel VTd:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
>         all mappings, or page-selective mappings (addr, size)
> invalidation target:
>         IOTLB entries (leaf)
>         paging structure cache (non-leaf)
>         PASID cache (pasid->cr3)
> invalidation hint:
>         whether global pages are included
>         drain reads/writes

The AMD IOMMU for example supports similar flushing capabilities. They
are all based on a IOMMU-independent PCI standards (PASID, PRI, ATS),
and the interface can be designed around those standards instead of
IOMMU hardware implementations.

If a given hardware implementation does not support all of the above
flushing modes, it is always free to flush more than requested as
supported by its capabilities.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/9] iommu: Introduce iommu do invalidate API function
@ 2017-07-26  9:02             ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-07-26  9:02 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lan, Tianyu, Liu, Yi L, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Wed, Jul 05, 2017 at 07:57:57AM +0000, Tian, Kevin wrote:
> > > struct tlb_invalidate_info
> > > {
> > >         __u32   model;  /* Vendor number */

I don't like to have a model-specifier here, as I don't think its
needed.

> > >         __u8 granularity
> > > #define DEVICE_SELECTVIE_INV    (1 << 0)
> > > #define PAGE_SELECTIVE_INV      (1 << 0)
> > > #define PASID_SELECTIVE_INV     (1 << 1)
> > >         __u32 pasid;
> > >         __u64 addr;
> > >         __u64 size;
> > >
> > >         /* Since IOMMU format has already been validated for this table,
> > >            the IOMMU driver knows that the following structure is in a
> > >            format it knows */
> > >         __u8 opaque[];
> > > };
> > >
> 
> I just gave some information in another thread:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00853.html
> 
> Below summarizes all the invalidation capabilities supported by Intel VTd:
> 
> Scope: All PASIDs, single PASID
> for each PASID:
>         all mappings, or page-selective mappings (addr, size)
> invalidation target:
>         IOTLB entries (leaf)
>         paging structure cache (non-leaf)
>         PASID cache (pasid->cr3)
> invalidation hint:
>         whether global pages are included
>         drain reads/writes

The AMD IOMMU for example supports similar flushing capabilities. They
are all based on a IOMMU-independent PCI standards (PASID, PRI, ATS),
and the interface can be designed around those standards instead of
IOMMU hardware implementations.

If a given hardware implementation does not support all of the above
flushing modes, it is always free to flush more than requested as
supported by its capabilities.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-08-16  9:44   ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-08-16  9:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare

On Tue, Jun 27, 2017 at 12:47:54PM -0700, Jacob Pan wrote:
> At the top level, three new IOMMU interfaces are introduced:
>  - bind PASID table
>  - passdown invalidation
>  - per device IOMMU fault notification

As this will make a good topic, I want to make you guys aware that there
will be a VFIO/IOMMU/PCI uconf at this years Linux Plumbers Conference:

	https://marc.info/?l=linux-pci&m=150212875623808&w=2

Alex and I will be attending along with other maintainers, so this is a
great opportunity to discuss the design and any questions around it in
person and make progress much faster than possible by email.

It would be great if you guys could submit a proposal around this topic
and attend as well.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-08-16  9:44   ` Joerg Roedel
  0 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-08-16  9:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Lan Tianyu, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Tue, Jun 27, 2017 at 12:47:54PM -0700, Jacob Pan wrote:
> At the top level, three new IOMMU interfaces are introduced:
>  - bind PASID table
>  - passdown invalidation
>  - per device IOMMU fault notification

As this will make a good topic, I want to make you guys aware that there
will be a VFIO/IOMMU/PCI uconf at this years Linux Plumbers Conference:

	https://marc.info/?l=linux-pci&m=150212875623808&w=2

Alex and I will be attending along with other maintainers, so this is a
great opportunity to discuss the design and any questions around it in
person and make progress much faster than possible by email.

It would be great if you guys could submit a proposal around this topic
and attend as well.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-08-16 15:14     ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-08-16 15:14 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare, jacob.jun.pan

On Wed, 16 Aug 2017 11:44:30 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:54PM -0700, Jacob Pan wrote:
> > At the top level, three new IOMMU interfaces are introduced:
> >  - bind PASID table
> >  - passdown invalidation
> >  - per device IOMMU fault notification  
> 
> As this will make a good topic, I want to make you guys aware that
> there will be a VFIO/IOMMU/PCI uconf at this years Linux Plumbers
> Conference:
> 
> 	https://marc.info/?l=linux-pci&m=150212875623808&w=2
> 
> Alex and I will be attending along with other maintainers, so this is
> a great opportunity to discuss the design and any questions around it
> in person and make progress much faster than possible by email.
> 
> It would be great if you guys could submit a proposal around this
> topic and attend as well.
> 
> 
Perfect, I submitted a topic on this at the microconf wiki page a
couple of weeks ago, plan to be at LPC too.
> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
@ 2017-08-16 15:14     ` Jacob Pan
  0 siblings, 0 replies; 38+ messages in thread
From: Jacob Pan @ 2017-08-16 15:14 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lan Tianyu, Tian, Kevin, LKML,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean Delvare,
	David Woodhouse

On Wed, 16 Aug 2017 11:44:30 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Tue, Jun 27, 2017 at 12:47:54PM -0700, Jacob Pan wrote:
> > At the top level, three new IOMMU interfaces are introduced:
> >  - bind PASID table
> >  - passdown invalidation
> >  - per device IOMMU fault notification  
> 
> As this will make a good topic, I want to make you guys aware that
> there will be a VFIO/IOMMU/PCI uconf at this years Linux Plumbers
> Conference:
> 
> 	https://marc.info/?l=linux-pci&m=150212875623808&w=2
> 
> Alex and I will be attending along with other maintainers, so this is
> a great opportunity to discuss the design and any questions around it
> in person and make progress much faster than possible by email.
> 
> It would be great if you guys could submit a proposal around this
> topic and attend as well.
> 
> 
Perfect, I submitted a topic on this at the microconf wiki page a
couple of weeks ago, plan to be at LPC too.
> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/9] IOMMU driver support for shared virtual memory virtualization
  2017-08-16 15:14     ` Jacob Pan
  (?)
@ 2017-08-16 16:23     ` Joerg Roedel
  -1 siblings, 0 replies; 38+ messages in thread
From: Joerg Roedel @ 2017-08-16 16:23 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, David Woodhouse, Liu, Yi L, Lan Tianyu, Tian, Kevin,
	Raj Ashok, Alex Williamson, Jean Delvare

On Wed, Aug 16, 2017 at 08:14:05AM -0700, Jacob Pan wrote:
> Perfect, I submitted a topic on this at the microconf wiki page a
> couple of weeks ago, plan to be at LPC too.

Great, I have seen your entry on the wiki page :) Now would be a perfect
time to add this to the Proposal list posted here:

	https://linuxplumbersconf.org/2017/ocw/events/LPC2017/tracks/636

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-08-16 16:23 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-27 19:47 [RFC 0/9] IOMMU driver support for shared virtual memory virtualization Jacob Pan
2017-06-27 19:47 ` Jacob Pan
2017-06-27 19:47 ` [PATCH 1/9] iommu: Introduce bind_pasid_table API function Jacob Pan
2017-06-28  9:57   ` Joerg Roedel
2017-06-28  9:57     ` Joerg Roedel
2017-06-27 19:47 ` [PATCH 2/9] iommu/vt-d: add bind_pasid_table function Jacob Pan
2017-06-27 19:47   ` Jacob Pan
2017-06-28 10:02   ` Joerg Roedel
2017-06-28 10:02     ` Joerg Roedel
2017-07-05  7:38   ` Tian, Kevin
2017-07-05  7:38     ` Tian, Kevin
2017-06-27 19:47 ` [PATCH 3/9] iommu: Introduce iommu do invalidate API function Jacob Pan
2017-06-28 10:08   ` Joerg Roedel
2017-06-28 16:09     ` Jacob Pan
2017-06-28 16:09       ` Jacob Pan
2017-06-28 17:07       ` Jean-Philippe Brucker
2017-06-28 17:07         ` Jean-Philippe Brucker
2017-07-05  7:57         ` Tian, Kevin
2017-07-05  7:57           ` Tian, Kevin
2017-07-05 12:42           ` Jean-Philippe Brucker
2017-07-05 12:42             ` Jean-Philippe Brucker
2017-07-26  9:02           ` Joerg Roedel
2017-07-26  9:02             ` Joerg Roedel
2017-06-27 19:47 ` [PATCH 4/9] iommu/vt-d: Add iommu do invalidate function Jacob Pan
2017-06-27 19:47 ` [PATCH 5/9] iommu: Introduce fault notifier API Jacob Pan
2017-06-28 10:16   ` Joerg Roedel
2017-06-28 10:16     ` Joerg Roedel
2017-06-28 16:16     ` Jacob Pan
2017-06-28 16:16       ` Jacob Pan
2017-06-27 19:48 ` [PATCH 6/9] iommu/vt-d: track device with pasid table bond to a guest Jacob Pan
2017-06-27 19:48 ` [PATCH 7/9] iommu/dmar: notify unrecoverable faults Jacob Pan
2017-06-27 19:48 ` [PATCH 8/9] iommu/intel-svm: notify page request to guest Jacob Pan
2017-06-27 19:48 ` [PATCH 9/9] iommu/intel-svm: replace dev ops with generic fault notifier Jacob Pan
2017-08-16  9:44 ` [RFC 0/9] IOMMU driver support for shared virtual memory virtualization Joerg Roedel
2017-08-16  9:44   ` Joerg Roedel
2017-08-16 15:14   ` Jacob Pan
2017-08-16 15:14     ` Jacob Pan
2017-08-16 16:23     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.