kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] PCI: support the ATS capability
@ 2009-03-23  7:58 Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 1/6] " Yu Zhao
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:58 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

This patch series implements Address Translation Service support for
the Intel IOMMU. The PCIe Endpoint that supports ATS capability can
request the DMA address translation from the IOMMU and cache the
translation itself. This can alleviate IOMMU TLB pressure and improve
the hardware performance in the I/O virtualization environment.

The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The
spec can be found at: http://www.pcisig.com/specifications/iov/ats/
(it requires membership).


Changelog:
v3 -> v4
  1, coding style fixes (Grant Grundler)
  2, support the Virtual Function ATS capability

v2 -> v3
  1, throw error message if VT-d hardware detects invalid descriptor
     on Queued Invalidation interface (David Woodhouse)
  2, avoid using pci_find_ext_capability every time when reading ATS
     Invalidate Queue Depth (Matthew Wilcox)

v1 -> v2
  added 'static' prefix to a local LIST_HEAD (Andrew Morton)


Yu Zhao (6):
  PCI: support the ATS capability
  PCI: handle Virtual Function ATS enabling
  VT-d: parse ATSR in DMA Remapping Reporting Structure
  VT-d: add device IOTLB invalidation support
  VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
  VT-d: support the device IOTLB

 drivers/pci/dmar.c          |  189 +++++++++++++++++++++++++++++++++++++++---
 drivers/pci/intel-iommu.c   |  139 ++++++++++++++++++++++++++------
 drivers/pci/iov.c           |  155 ++++++++++++++++++++++++++++++++++--
 drivers/pci/pci.h           |   39 +++++++++
 include/linux/dmar.h        |    9 ++
 include/linux/intel-iommu.h |   16 ++++-
 include/linux/pci.h         |    2 +
 include/linux/pci_regs.h    |   10 +++
 8 files changed, 514 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/6] PCI: support the ATS capability
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
@ 2009-03-23  7:58 ` Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 2/6] PCI: handle Virtual Function ATS enabling Yu Zhao
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:58 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

The PCIe ATS capability makes the Endpoint be able to request the
DMA address translation from the IOMMU and cache the translation
in the device side, thus alleviate IOMMU TLB pressure and improve
the hardware performance in the I/O virtualization environment.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/iov.c        |  105 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h        |   37 ++++++++++++++++
 include/linux/pci.h      |    2 +
 include/linux/pci_regs.h |   10 ++++
 4 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 7227efc..8a9817c 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -5,6 +5,7 @@
  *
  * PCI Express I/O Virtualization (IOV) support.
  *   Single Root IOV 1.0
+ *   Address Translation Service 1.0
  */
 
 #include <linux/pci.h>
@@ -678,3 +679,107 @@ irqreturn_t pci_sriov_migration(struct pci_dev *dev)
 	return sriov_migration(dev) ? IRQ_HANDLED : IRQ_NONE;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_migration);
+
+static int ats_alloc_one(struct pci_dev *dev, int pgshift)
+{
+	int pos;
+	u16 cap;
+	struct pci_ats *ats;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
+	if (!pos)
+		return -ENODEV;
+
+	ats = kzalloc(sizeof(*ats), GFP_KERNEL);
+	if (!ats)
+		return -ENOMEM;
+
+	ats->pos = pos;
+	ats->stu = pgshift;
+	pci_read_config_word(dev, pos + PCI_ATS_CAP, &cap);
+	ats->qdep = PCI_ATS_CAP_QDEP(cap) ? PCI_ATS_CAP_QDEP(cap) :
+					    PCI_ATS_MAX_QDEP;
+	dev->ats = ats;
+
+	return 0;
+}
+
+static void ats_free_one(struct pci_dev *dev)
+{
+	kfree(dev->ats);
+	dev->ats = NULL;
+}
+
+/**
+ * pci_enable_ats - enable the ATS capability
+ * @dev: the PCI device
+ * @pgshift: the IOMMU page shift
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_enable_ats(struct pci_dev *dev, int pgshift)
+{
+	int rc;
+	u16 ctrl;
+
+	BUG_ON(dev->ats);
+
+	if (pgshift < PCI_ATS_MIN_STU)
+		return -EINVAL;
+
+	rc = ats_alloc_one(dev, pgshift);
+	if (rc)
+		return rc;
+
+	ctrl = PCI_ATS_CTRL_ENABLE;
+	ctrl |= PCI_ATS_CTRL_STU(pgshift - PCI_ATS_MIN_STU);
+	pci_write_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, ctrl);
+
+	return 0;
+}
+
+/**
+ * pci_disable_ats - disable the ATS capability
+ * @dev: the PCI device
+ */
+void pci_disable_ats(struct pci_dev *dev)
+{
+	u16 ctrl;
+
+	BUG_ON(!dev->ats);
+
+	pci_read_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, &ctrl);
+	ctrl &= ~PCI_ATS_CTRL_ENABLE;
+	pci_write_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, ctrl);
+
+	ats_free_one(dev);
+}
+
+/**
+ * pci_ats_queue_depth - query the ATS Invalidate Queue Depth
+ * @dev: the PCI device
+ *
+ * Returns the queue depth on success, or negative on failure.
+ *
+ * The ATS spec uses 0 in the Invalidate Queue Depth field to
+ * indicate that the function can accept 32 Invalidate Request.
+ * But here we use the `real' values (i.e. 1~32) for the Queue
+ * Depth.
+ */
+int pci_ats_queue_depth(struct pci_dev *dev)
+{
+	int pos;
+	u16 cap;
+
+	if (dev->ats)
+		return dev->ats->qdep;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
+	if (!pos)
+		return -ENODEV;
+
+	pci_read_config_word(dev, pos + PCI_ATS_CAP, &cap);
+
+	return PCI_ATS_CAP_QDEP(cap) ? PCI_ATS_CAP_QDEP(cap) :
+				       PCI_ATS_MAX_QDEP;
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index dd7c63f..9f0db6a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -218,6 +218,13 @@ struct pci_sriov {
 	u8 __iomem *mstate;	/* VF Migration State Array */
 };
 
+/* Address Translation Service */
+struct pci_ats {
+	int pos;	/* capability position */
+	int stu;	/* Smallest Translation Unit */
+	int qdep;	/* Invalidate Queue Depth */
+};
+
 #ifdef CONFIG_PCI_IOV
 extern int pci_iov_init(struct pci_dev *dev);
 extern void pci_iov_release(struct pci_dev *dev);
@@ -225,6 +232,20 @@ extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
 				enum pci_bar_type *type);
 extern void pci_restore_iov_state(struct pci_dev *dev);
 extern int pci_iov_bus_range(struct pci_bus *bus);
+
+extern int pci_enable_ats(struct pci_dev *dev, int pgshift);
+extern void pci_disable_ats(struct pci_dev *dev);
+extern int pci_ats_queue_depth(struct pci_dev *dev);
+/**
+ * pci_ats_enabled - query the ATS status
+ * @dev: the PCI device
+ *
+ * Returns 1 if ATS capability is enabled, or 0 if not.
+ */
+static inline int pci_ats_enabled(struct pci_dev *dev)
+{
+	return !!dev->ats;
+}
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
 {
@@ -246,6 +267,22 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 {
 	return 0;
 }
+
+static inline int pci_enable_ats(struct pci_dev *dev, int pgshift)
+{
+	return -ENODEV;
+}
+static inline void pci_disable_ats(struct pci_dev *dev)
+{
+}
+static inline int pci_ats_queue_depth(struct pci_dev *dev)
+{
+	return -ENODEV;
+}
+static inline int pci_ats_enabled(struct pci_dev *dev)
+{
+	return 0;
+}
 #endif /* CONFIG_PCI_IOV */
 
 #endif /* DRIVERS_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index df78327..de80acd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,7 @@ struct pci_cap_saved_state {
 struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
+struct pci_ats;
 
 /*
  * The pci_dev structure is used to describe PCI devices.
@@ -285,6 +286,7 @@ struct pci_dev {
 		struct pci_sriov *sriov;	/* SR-IOV capability related */
 		struct pci_dev *physfn;	/* the PF this VF is associated with */
 	};
+	struct pci_ats	*ats;	/* Address Translation Service */
 #endif
 };
 
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 4ce5eb0..999de5f 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -499,6 +499,7 @@
 #define PCI_EXT_CAP_ID_DSN	3
 #define PCI_EXT_CAP_ID_PWR	4
 #define PCI_EXT_CAP_ID_ARI	14
+#define PCI_EXT_CAP_ID_ATS	15
 #define PCI_EXT_CAP_ID_SRIOV	16
 
 /* Advanced Error Reporting */
@@ -617,6 +618,15 @@
 #define  PCI_ARI_CTRL_ACS	0x0002	/* ACS Function Groups Enable */
 #define  PCI_ARI_CTRL_FG(x)	(((x) >> 4) & 7) /* Function Group */
 
+/* Address Translation Service */
+#define PCI_ATS_CAP		0x04	/* ATS Capability Register */
+#define  PCI_ATS_CAP_QDEP(x)	((x) & 0x1f)	/* Invalidate Queue Depth */
+#define  PCI_ATS_MAX_QDEP	32	/* Max Invalidate Queue Depth */
+#define PCI_ATS_CTRL		0x06	/* ATS Control Register */
+#define  PCI_ATS_CTRL_ENABLE	0x8000	/* ATS Enable */
+#define  PCI_ATS_CTRL_STU(x)	((x) & 0x1f)	/* Smallest Translation Unit */
+#define  PCI_ATS_MIN_STU	12	/* shift of minimum STU block */
+
 /* Single Root I/O Virtualization */
 #define PCI_SRIOV_CAP		0x04	/* SR-IOV Capabilities */
 #define  PCI_SRIOV_CAP_VFM	0x01	/* VF Migration Capable */
-- 
1.5.6.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/6] PCI: handle Virtual Function ATS enabling
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 1/6] " Yu Zhao
@ 2009-03-23  7:58 ` Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 3/6] VT-d: parse ATSR in DMA Remapping Reporting Structure Yu Zhao
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:58 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

The SR-IOV spec requires the Smallest Translation Unit and the
Invalidate Queue Depth fields in the Virtual Function ATS capability
to be hardwired to 0. If a function is a Virtual Function, then and
set its Physical Function's STU before enabling the ATS.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/iov.c |   66 +++++++++++++++++++++++++++++++++++++++++-----------
 drivers/pci/pci.h |    4 ++-
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 8a9817c..0bf23fc 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -491,10 +491,10 @@ found:
 
 	if (pdev)
 		iov->dev = pci_dev_get(pdev);
-	else {
+	else
 		iov->dev = dev;
-		mutex_init(&iov->lock);
-	}
+
+	mutex_init(&iov->lock);
 
 	dev->sriov = iov;
 	dev->is_physfn = 1;
@@ -514,11 +514,11 @@ static void sriov_release(struct pci_dev *dev)
 {
 	BUG_ON(dev->sriov->nr_virtfn);
 
-	if (dev == dev->sriov->dev)
-		mutex_destroy(&dev->sriov->lock);
-	else
+	if (dev != dev->sriov->dev)
 		pci_dev_put(dev->sriov->dev);
 
+	mutex_destroy(&dev->sriov->lock);
+
 	kfree(dev->sriov);
 	dev->sriov = NULL;
 }
@@ -722,19 +722,40 @@ int pci_enable_ats(struct pci_dev *dev, int pgshift)
 	int rc;
 	u16 ctrl;
 
-	BUG_ON(dev->ats);
+	BUG_ON(dev->ats && dev->ats->is_enabled);
 
 	if (pgshift < PCI_ATS_MIN_STU)
 		return -EINVAL;
 
-	rc = ats_alloc_one(dev, pgshift);
-	if (rc)
-		return rc;
+	if (dev->is_physfn || dev->is_virtfn) {
+		struct pci_dev *pdev = dev->is_physfn ? dev : dev->physfn;
+
+		mutex_lock(&pdev->sriov->lock);
+		if (pdev->ats)
+			rc = pdev->ats->stu == pgshift ? 0 : -EINVAL;
+		else
+			rc = ats_alloc_one(pdev, pgshift);
+
+		if (!rc)
+			pdev->ats->ref_cnt++;
+		mutex_unlock(&pdev->sriov->lock);
+		if (rc)
+			return rc;
+	}
+
+	if (!dev->is_physfn) {
+		rc = ats_alloc_one(dev, pgshift);
+		if (rc)
+			return rc;
+	}
 
 	ctrl = PCI_ATS_CTRL_ENABLE;
-	ctrl |= PCI_ATS_CTRL_STU(pgshift - PCI_ATS_MIN_STU);
+	if (!dev->is_virtfn)
+		ctrl |= PCI_ATS_CTRL_STU(pgshift - PCI_ATS_MIN_STU);
 	pci_write_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, ctrl);
 
+	dev->ats->is_enabled = 1;
+
 	return 0;
 }
 
@@ -746,13 +767,26 @@ void pci_disable_ats(struct pci_dev *dev)
 {
 	u16 ctrl;
 
-	BUG_ON(!dev->ats);
+	BUG_ON(!dev->ats || !dev->ats->is_enabled);
 
 	pci_read_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, &ctrl);
 	ctrl &= ~PCI_ATS_CTRL_ENABLE;
 	pci_write_config_word(dev, dev->ats->pos + PCI_ATS_CTRL, ctrl);
 
-	ats_free_one(dev);
+	dev->ats->is_enabled = 0;
+
+	if (dev->is_physfn || dev->is_virtfn) {
+		struct pci_dev *pdev = dev->is_physfn ? dev : dev->physfn;
+
+		mutex_lock(&pdev->sriov->lock);
+		pdev->ats->ref_cnt--;
+		if (!pdev->ats->ref_cnt)
+			ats_free_one(pdev);
+		mutex_unlock(&pdev->sriov->lock);
+	}
+
+	if (!dev->is_physfn)
+		ats_free_one(dev);
 }
 
 /**
@@ -764,13 +798,17 @@ void pci_disable_ats(struct pci_dev *dev)
  * The ATS spec uses 0 in the Invalidate Queue Depth field to
  * indicate that the function can accept 32 Invalidate Request.
  * But here we use the `real' values (i.e. 1~32) for the Queue
- * Depth.
+ * Depth; and 0 indicates the function shares the Queue with
+ * other functions (doesn't exclusively own a Queue).
  */
 int pci_ats_queue_depth(struct pci_dev *dev)
 {
 	int pos;
 	u16 cap;
 
+	if (dev->is_virtfn)
+		return 0;
+
 	if (dev->ats)
 		return dev->ats->qdep;
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9f0db6a..8ecd185 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -223,6 +223,8 @@ struct pci_ats {
 	int pos;	/* capability position */
 	int stu;	/* Smallest Translation Unit */
 	int qdep;	/* Invalidate Queue Depth */
+	int ref_cnt;	/* Physical Function reference count */
+	int is_enabled:1;	/* Enable bit is set */
 };
 
 #ifdef CONFIG_PCI_IOV
@@ -244,7 +246,7 @@ extern int pci_ats_queue_depth(struct pci_dev *dev);
  */
 static inline int pci_ats_enabled(struct pci_dev *dev)
 {
-	return !!dev->ats;
+	return dev->ats && dev->ats->is_enabled;
 }
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
-- 
1.5.6.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/6] VT-d: parse ATSR in DMA Remapping Reporting Structure
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 1/6] " Yu Zhao
  2009-03-23  7:58 ` [PATCH v4 2/6] PCI: handle Virtual Function ATS enabling Yu Zhao
@ 2009-03-23  7:58 ` Yu Zhao
  2009-03-23  7:59 ` [PATCH v4 4/6] VT-d: add device IOTLB invalidation support Yu Zhao
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:58 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

Parse the Root Port ATS Capability Reporting Structure in the DMA
Remapping Reporting Structure ACPI table.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/dmar.c          |  112 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/dmar.h        |    9 ++++
 include/linux/intel-iommu.h |    1 +
 3 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 26c536b..106bc45 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -254,6 +254,84 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
 	}
 	return ret;
 }
+
+static LIST_HEAD(dmar_atsr_units);
+
+static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
+{
+	struct acpi_dmar_atsr *atsr;
+	struct dmar_atsr_unit *atsru;
+
+	atsr = container_of(hdr, struct acpi_dmar_atsr, header);
+	atsru = kzalloc(sizeof(*atsru), GFP_KERNEL);
+	if (!atsru)
+		return -ENOMEM;
+
+	atsru->hdr = hdr;
+	atsru->include_all = atsr->flags & 0x1;
+
+	list_add(&atsru->list, &dmar_atsr_units);
+
+	return 0;
+}
+
+static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru)
+{
+	int rc;
+	struct acpi_dmar_atsr *atsr;
+
+	if (atsru->include_all)
+		return 0;
+
+	atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
+	rc = dmar_parse_dev_scope((void *)(atsr + 1),
+				(void *)atsr + atsr->header.length,
+				&atsru->devices_cnt, &atsru->devices,
+				atsr->segment);
+	if (rc || !atsru->devices_cnt) {
+		list_del(&atsru->list);
+		kfree(atsru);
+	}
+
+	return rc;
+}
+
+int dmar_find_matched_atsr_unit(struct pci_dev *dev)
+{
+	int i;
+	struct pci_bus *bus;
+	struct acpi_dmar_atsr *atsr;
+	struct dmar_atsr_unit *atsru;
+
+	list_for_each_entry(atsru, &dmar_atsr_units, list) {
+		atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
+		if (atsr->segment == pci_domain_nr(dev->bus))
+			goto found;
+	}
+
+	return 0;
+
+found:
+	for (bus = dev->bus; bus; bus = bus->parent) {
+		struct pci_dev *bridge = bus->self;
+
+		if (!bridge || !bridge->is_pcie ||
+		    bridge->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+			return 0;
+
+		if (bridge->pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+			for (i = 0; i < atsru->devices_cnt; i++)
+				if (atsru->devices[i] == bridge)
+					return 1;
+			break;
+		}
+	}
+
+	if (atsru->include_all)
+		return 1;
+
+	return 0;
+}
 #endif
 
 static void __init
@@ -261,22 +339,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header *header)
 {
 	struct acpi_dmar_hardware_unit *drhd;
 	struct acpi_dmar_reserved_memory *rmrr;
+	struct acpi_dmar_atsr *atsr;
 
 	switch (header->type) {
 	case ACPI_DMAR_TYPE_HARDWARE_UNIT:
-		drhd = (struct acpi_dmar_hardware_unit *)header;
+		drhd = container_of(header, struct acpi_dmar_hardware_unit,
+				    header);
 		printk (KERN_INFO PREFIX
-			"DRHD (flags: 0x%08x)base: 0x%016Lx\n",
-			drhd->flags, (unsigned long long)drhd->address);
+			"DRHD base: %#016Lx flags: %#x\n",
+			(unsigned long long)drhd->address, drhd->flags);
 		break;
 	case ACPI_DMAR_TYPE_RESERVED_MEMORY:
-		rmrr = (struct acpi_dmar_reserved_memory *)header;
-
+		rmrr = container_of(header, struct acpi_dmar_reserved_memory,
+				    header);
 		printk (KERN_INFO PREFIX
-			"RMRR base: 0x%016Lx end: 0x%016Lx\n",
+			"RMRR base: %#016Lx end: %#016Lx\n",
 			(unsigned long long)rmrr->base_address,
 			(unsigned long long)rmrr->end_address);
 		break;
+	case ACPI_DMAR_TYPE_ATSR:
+		atsr = container_of(header, struct acpi_dmar_atsr, header);
+		printk(KERN_INFO PREFIX "ATSR flags: %#x\n", atsr->flags);
+		break;
 	}
 }
 
@@ -349,6 +433,11 @@ parse_dmar_table(void)
 			ret = dmar_parse_one_rmrr(entry_header);
 #endif
 			break;
+		case ACPI_DMAR_TYPE_ATSR:
+#ifdef CONFIG_DMAR
+			ret = dmar_parse_one_atsr(entry_header);
+#endif
+			break;
 		default:
 			printk(KERN_WARNING PREFIX
 				"Unknown DMAR structure type\n");
@@ -417,11 +506,19 @@ int __init dmar_dev_scope_init(void)
 #ifdef CONFIG_DMAR
 	{
 		struct dmar_rmrr_unit *rmrr, *rmrr_n;
+		struct dmar_atsr_unit *atsr, *atsr_n;
+
 		list_for_each_entry_safe(rmrr, rmrr_n, &dmar_rmrr_units, list) {
 			ret = rmrr_parse_dev(rmrr);
 			if (ret)
 				return ret;
 		}
+
+		list_for_each_entry_safe(atsr, atsr_n, &dmar_atsr_units, list) {
+			ret = atsr_parse_dev(atsr);
+			if (ret)
+				return ret;
+		}
 	}
 #endif
 
@@ -454,6 +551,9 @@ int __init dmar_table_init(void)
 #ifdef CONFIG_DMAR
 	if (list_empty(&dmar_rmrr_units))
 		printk(KERN_INFO PREFIX "No RMRR found\n");
+
+	if (list_empty(&dmar_atsr_units))
+		printk(KERN_INFO PREFIX "No ATSR found\n");
 #endif
 
 #ifdef CONFIG_INTR_REMAP
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f284407..d3a1234 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -142,6 +142,15 @@ struct dmar_rmrr_unit {
 
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
+
+struct dmar_atsr_unit {
+	struct list_head list;		/* list of ATSR units */
+	struct acpi_dmar_header *hdr;	/* ACPI header */
+	struct pci_dev **devices;	/* target devices */
+	int devices_cnt;		/* target device count */
+	u8 include_all:1;		/* include all ports */
+};
+
 /* Intel DMAR  initialization functions */
 extern int intel_iommu_init(void);
 #else
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d2e3cbf..660a7f4 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -317,6 +317,7 @@ static inline void __iommu_flush_cache(
 }
 
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
+extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
 extern int alloc_iommu(struct dmar_drhd_unit *drhd);
 extern void free_iommu(struct intel_iommu *iommu);
-- 
1.5.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/6] VT-d: add device IOTLB invalidation support
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
                   ` (2 preceding siblings ...)
  2009-03-23  7:58 ` [PATCH v4 3/6] VT-d: parse ATSR in DMA Remapping Reporting Structure Yu Zhao
@ 2009-03-23  7:59 ` Yu Zhao
  2009-03-29  5:19   ` Grant Grundler
  2009-03-23  7:59 ` [PATCH v4 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps Yu Zhao
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:59 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

Support device IOTLB invalidation to flush the translation cached
in the Endpoint.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/dmar.c          |   77 ++++++++++++++++++++++++++++++++++++++----
 include/linux/intel-iommu.h |   14 +++++++-
 2 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 106bc45..494b167 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -674,7 +674,8 @@ void free_iommu(struct intel_iommu *iommu)
  */
 static inline void reclaim_free_desc(struct q_inval *qi)
 {
-	while (qi->desc_status[qi->free_tail] == QI_DONE) {
+	while (qi->desc_status[qi->free_tail] == QI_DONE ||
+	       qi->desc_status[qi->free_tail] == QI_ABORT) {
 		qi->desc_status[qi->free_tail] = QI_FREE;
 		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
 		qi->free_cnt++;
@@ -684,10 +685,13 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 static int qi_check_fault(struct intel_iommu *iommu, int index)
 {
 	u32 fault;
-	int head;
+	int head, tail;
 	struct q_inval *qi = iommu->qi;
 	int wait_index = (index + 1) % QI_LENGTH;
 
+	if (qi->desc_status[wait_index] == QI_ABORT)
+		return -EAGAIN;
+
 	fault = readl(iommu->reg + DMAR_FSTS_REG);
 
 	/*
@@ -697,7 +701,11 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 	 */
 	if (fault & DMA_FSTS_IQE) {
 		head = readl(iommu->reg + DMAR_IQH_REG);
-		if ((head >> 4) == index) {
+		if ((head >> DMAR_IQ_OFFSET) == index) {
+			printk(KERN_ERR "VT-d detected invalid descriptor: "
+				"low=%llx, high=%llx\n",
+				(unsigned long long)qi->desc[index].low,
+				(unsigned long long)qi->desc[index].high);
 			memcpy(&qi->desc[index], &qi->desc[wait_index],
 					sizeof(struct qi_desc));
 			__iommu_flush_cache(iommu, &qi->desc[index],
@@ -707,6 +715,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 		}
 	}
 
+	/*
+	 * If ITE happens, all pending wait_desc commands are aborted.
+	 * No new descriptors are fetched until the ITE is cleared.
+	 */
+	if (fault & DMA_FSTS_ITE) {
+		head = readl(iommu->reg + DMAR_IQH_REG);
+		head = ((head >> DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+		head |= 1;
+		tail = readl(iommu->reg + DMAR_IQT_REG);
+		tail = ((tail >> DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+
+		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
+
+		do {
+			if (qi->desc_status[head] == QI_IN_USE)
+				qi->desc_status[head] = QI_ABORT;
+			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
+		} while (head != tail);
+
+		if (qi->desc_status[wait_index] == QI_ABORT)
+			return -EAGAIN;
+	}
+
+	if (fault & DMA_FSTS_ICE)
+		writel(DMA_FSTS_ICE, iommu->reg + DMAR_FSTS_REG);
+
 	return 0;
 }
 
@@ -716,7 +750,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
  */
 int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 {
-	int rc = 0;
+	int rc;
 	struct q_inval *qi = iommu->qi;
 	struct qi_desc *hw, wait_desc;
 	int wait_index, index;
@@ -727,6 +761,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 
 	hw = qi->desc;
 
+restart:
+	rc = 0;
+
 	spin_lock_irqsave(&qi->q_lock, flags);
 	while (qi->free_cnt < 3) {
 		spin_unlock_irqrestore(&qi->q_lock, flags);
@@ -757,7 +794,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	 * update the HW tail register indicating the presence of
 	 * new descriptors.
 	 */
-	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
+	writel(qi->free_head << DMAR_IQ_OFFSET, iommu->reg + DMAR_IQT_REG);
 
 	while (qi->desc_status[wait_index] != QI_DONE) {
 		/*
@@ -769,18 +806,21 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		 */
 		rc = qi_check_fault(iommu, index);
 		if (rc)
-			goto out;
+			break;
 
 		spin_unlock(&qi->q_lock);
 		cpu_relax();
 		spin_lock(&qi->q_lock);
 	}
-out:
-	qi->desc_status[index] = qi->desc_status[wait_index] = QI_DONE;
+
+	qi->desc_status[index] = QI_DONE;
 
 	reclaim_free_desc(qi);
 	spin_unlock_irqrestore(&qi->q_lock, flags);
 
+	if (rc == -EAGAIN)
+		goto restart;
+
 	return rc;
 }
 
@@ -847,6 +887,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	return qi_submit_sync(&desc, iommu);
 }
 
+int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
+			u64 addr, unsigned mask)
+{
+	struct qi_desc desc;
+
+	if (mask) {
+		BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
+		addr |= (1 << (VTD_PAGE_SHIFT + mask - 1)) - 1;
+		desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
+	} else
+		desc.high = QI_DEV_IOTLB_ADDR(addr);
+
+	if (qdep >= QI_DEV_IOTLB_MAX_INVS)
+		qdep = 0;
+
+	desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
+		   QI_DIOTLB_TYPE;
+
+	return qi_submit_sync(&desc, iommu);
+}
+
 /*
  * Enable Queued Invalidation interface. This is a must to support
  * interrupt-remapping. Also used by DMA-remapping, which replaces
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 660a7f4..a32b3db 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -53,6 +53,7 @@
 #define	DMAR_PHMLIMIT_REG 0x78	/* pmrr high limit */
 #define DMAR_IQH_REG	0x80	/* Invalidation queue head register */
 #define DMAR_IQT_REG	0x88	/* Invalidation queue tail register */
+#define DMAR_IQ_OFFSET	4	/* Invalidation queue head/tail offset */
 #define DMAR_IQA_REG	0x90	/* Invalidation queue addr register */
 #define DMAR_ICS_REG	0x98	/* Invalidation complete status register */
 #define DMAR_IRTA_REG	0xb8    /* Interrupt remapping table addr register */
@@ -195,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 #define DMA_FSTS_PPF ((u32)2)
 #define DMA_FSTS_PFO ((u32)1)
 #define DMA_FSTS_IQE (1 << 4)
+#define DMA_FSTS_ICE (1 << 5)
+#define DMA_FSTS_ITE (1 << 6)
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
@@ -223,7 +226,8 @@ do {									\
 enum {
 	QI_FREE,
 	QI_IN_USE,
-	QI_DONE
+	QI_DONE,
+	QI_ABORT
 };
 
 #define QI_CC_TYPE		0x1
@@ -252,6 +256,12 @@ enum {
 #define QI_CC_DID(did)		(((u64)did) << 16)
 #define QI_CC_GRAN(gran)	(((u64)gran) >> (DMA_CCMD_INVL_GRANU_OFFSET-4))
 
+#define QI_DEV_IOTLB_SID(sid)	((u64)((sid) & 0xffff) << 32)
+#define QI_DEV_IOTLB_QDEP(qdep)	(((qdep) & 0x1f) << 16)
+#define QI_DEV_IOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
+#define QI_DEV_IOTLB_SIZE	1
+#define QI_DEV_IOTLB_MAX_INVS	32
+
 struct qi_desc {
 	u64 low, high;
 };
@@ -329,6 +339,8 @@ extern int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
 extern int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type,
 			  int non_present_entry_flush);
+extern int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
+			       u64 addr, unsigned mask);
 
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
-- 
1.5.6.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
                   ` (3 preceding siblings ...)
  2009-03-23  7:59 ` [PATCH v4 4/6] VT-d: add device IOTLB invalidation support Yu Zhao
@ 2009-03-23  7:59 ` Yu Zhao
  2009-03-23  7:59 ` [PATCH v4 6/6] VT-d: support the device IOTLB Yu Zhao
  2009-03-26 23:15 ` [PATCH v4 0/6] PCI: support the ATS capability Jesse Barnes
  6 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:59 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

Make iommu_flush_iotlb_psi() and flush_unmaps() more readable.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/intel-iommu.c |   46 +++++++++++++++++++++-----------------------
 1 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f3f6865..3145368 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -927,30 +927,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did,
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
 	u64 addr, unsigned int pages, int non_present_entry_flush)
 {
-	unsigned int mask;
+	int rc;
+	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
 
 	BUG_ON(addr & (~VTD_PAGE_MASK));
 	BUG_ON(pages == 0);
 
-	/* Fallback to domain selective flush if no PSI support */
-	if (!cap_pgsel_inv(iommu->cap))
-		return iommu->flush.flush_iotlb(iommu, did, 0, 0,
-						DMA_TLB_DSI_FLUSH,
-						non_present_entry_flush);
-
 	/*
+	 * Fallback to domain selective flush if no PSI support or the size is
+	 * too big.
 	 * PSI requires page size to be 2 ^ x, and the base address is naturally
 	 * aligned to the size
 	 */
-	mask = ilog2(__roundup_pow_of_two(pages));
-	/* Fallback to domain selective flush if size is too big */
-	if (mask > cap_max_amask_val(iommu->cap))
-		return iommu->flush.flush_iotlb(iommu, did, 0, 0,
-			DMA_TLB_DSI_FLUSH, non_present_entry_flush);
-
-	return iommu->flush.flush_iotlb(iommu, did, addr, mask,
-					DMA_TLB_PSI_FLUSH,
-					non_present_entry_flush);
+	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
+		rc = iommu->flush.flush_iotlb(iommu, did, 0, 0,
+						DMA_TLB_DSI_FLUSH,
+						non_present_entry_flush);
+	else
+		rc = iommu->flush.flush_iotlb(iommu, did, addr, mask,
+						DMA_TLB_PSI_FLUSH,
+						non_present_entry_flush);
+	return rc;
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -2303,15 +2300,16 @@ static void flush_unmaps(void)
 		if (!iommu)
 			continue;
 
-		if (deferred_flush[i].next) {
-			iommu->flush.flush_iotlb(iommu, 0, 0, 0,
-						 DMA_TLB_GLOBAL_FLUSH, 0);
-			for (j = 0; j < deferred_flush[i].next; j++) {
-				__free_iova(&deferred_flush[i].domain[j]->iovad,
-						deferred_flush[i].iova[j]);
-			}
-			deferred_flush[i].next = 0;
+		if (!deferred_flush[i].next)
+			continue;
+
+		iommu->flush.flush_iotlb(iommu, 0, 0, 0,
+					 DMA_TLB_GLOBAL_FLUSH, 0);
+		for (j = 0; j < deferred_flush[i].next; j++) {
+			__free_iova(&deferred_flush[i].domain[j]->iovad,
+					deferred_flush[i].iova[j]);
 		}
+		deferred_flush[i].next = 0;
 	}
 
 	list_size = 0;
-- 
1.5.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/6] VT-d: support the device IOTLB
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
                   ` (4 preceding siblings ...)
  2009-03-23  7:59 ` [PATCH v4 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps Yu Zhao
@ 2009-03-23  7:59 ` Yu Zhao
  2009-03-26 23:15 ` [PATCH v4 0/6] PCI: support the ATS capability Jesse Barnes
  6 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-23  7:59 UTC (permalink / raw)
  To: dwmw2, jbarnes; +Cc: linux-pci, iommu, kvm, linux-kernel, Yu Zhao

Enable the device IOTLB (i.e. ATS) for both the bare metal and KVM
environments.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/intel-iommu.c   |   99 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/intel-iommu.h |    1 +
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 3145368..799bbe5 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -127,6 +127,7 @@ static inline void context_set_fault_enable(struct context_entry *context)
 }
 
 #define CONTEXT_TT_MULTI_LEVEL 0
+#define CONTEXT_TT_DEV_IOTLB	1
 
 static inline void context_set_translation_type(struct context_entry *context,
 						unsigned long value)
@@ -242,6 +243,7 @@ struct device_domain_info {
 	struct list_head global; /* link to global list */
 	u8 bus;			/* PCI bus numer */
 	u8 devfn;		/* PCI devfn number */
+	struct intel_iommu *iommu; /* IOMMU used by this device */
 	struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
 	struct dmar_domain *domain; /* pointer to domain */
 };
@@ -924,6 +926,80 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did,
 	return 0;
 }
 
+static struct device_domain_info *
+iommu_support_dev_iotlb(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+	int found = 0;
+	unsigned long flags;
+	struct device_domain_info *info;
+	struct intel_iommu *iommu = device_to_iommu(bus, devfn);
+
+	if (!ecap_dev_iotlb_support(iommu->ecap))
+		return NULL;
+
+	if (!iommu->qi)
+		return NULL;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	list_for_each_entry(info, &domain->devices, link)
+		if (info->bus == bus && info->devfn == devfn) {
+			found = 1;
+			break;
+		}
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	if (!found || !info->dev)
+		return NULL;
+
+	if (!pci_find_ext_capability(info->dev, PCI_EXT_CAP_ID_ATS))
+		return NULL;
+
+	if (!dmar_find_matched_atsr_unit(info->dev))
+		return NULL;
+
+	info->iommu = iommu;
+
+	return info;
+}
+
+static void iommu_enable_dev_iotlb(struct device_domain_info *info)
+{
+	if (!info)
+		return;
+
+	pci_enable_ats(info->dev, VTD_PAGE_SHIFT);
+}
+
+static void iommu_disable_dev_iotlb(struct device_domain_info *info)
+{
+	if (!info->dev || !pci_ats_enabled(info->dev))
+		return;
+
+	pci_disable_ats(info->dev);
+}
+
+static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
+				  u64 addr, unsigned mask)
+{
+	int rc;
+	u16 sid, qdep;
+	unsigned long flags;
+	struct device_domain_info *info;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	list_for_each_entry(info, &domain->devices, link) {
+		if (!info->dev || !pci_ats_enabled(info->dev))
+			continue;
+
+		sid = info->bus << 8 | info->devfn;
+		qdep = pci_ats_queue_depth(info->dev);
+		rc = qi_flush_dev_iotlb(info->iommu, sid, qdep, addr, mask);
+		if (rc)
+			dev_err(&info->dev->dev, "flush IOTLB failed\n");
+	}
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+}
+
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
 	u64 addr, unsigned int pages, int non_present_entry_flush)
 {
@@ -947,6 +1023,9 @@ static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
 		rc = iommu->flush.flush_iotlb(iommu, did, addr, mask,
 						DMA_TLB_PSI_FLUSH,
 						non_present_entry_flush);
+	if (!rc && !non_present_entry_flush)
+		iommu_flush_dev_iotlb(iommu->domains[did], addr, mask);
+
 	return rc;
 }
 
@@ -1471,6 +1550,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	unsigned long ndomains;
 	int id;
 	int agaw;
+	struct device_domain_info *info;
 
 	pr_debug("Set context mapping for %02x:%02x.%d\n",
 		bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1536,7 +1616,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	context_set_domain_id(context, id);
 	context_set_address_width(context, iommu->agaw);
 	context_set_address_root(context, virt_to_phys(pgd));
-	context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
+	info = iommu_support_dev_iotlb(domain, bus, devfn);
+	context_set_translation_type(context,
+		info ? CONTEXT_TT_DEV_IOTLB : CONTEXT_TT_MULTI_LEVEL);
 	context_set_fault_enable(context);
 	context_set_present(context);
 	domain_flush_cache(domain, context, sizeof(*context));
@@ -1549,6 +1631,8 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	else
 		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_DSI_FLUSH, 0);
 
+	iommu_enable_dev_iotlb(info);
+
 	spin_unlock_irqrestore(&iommu->lock, flags);
 
 	spin_lock_irqsave(&domain->iommu_lock, flags);
@@ -1689,6 +1773,7 @@ static void domain_remove_dev_info(struct dmar_domain *domain)
 			info->dev->dev.archdata.iommu = NULL;
 		spin_unlock_irqrestore(&device_domain_lock, flags);
 
+		iommu_disable_dev_iotlb(info);
 		iommu = device_to_iommu(info->bus, info->devfn);
 		iommu_detach_dev(iommu, info->bus, info->devfn);
 		free_devinfo_mem(info);
@@ -2306,8 +2391,14 @@ static void flush_unmaps(void)
 		iommu->flush.flush_iotlb(iommu, 0, 0, 0,
 					 DMA_TLB_GLOBAL_FLUSH, 0);
 		for (j = 0; j < deferred_flush[i].next; j++) {
-			__free_iova(&deferred_flush[i].domain[j]->iovad,
-					deferred_flush[i].iova[j]);
+			unsigned long mask;
+			struct iova *iova = deferred_flush[i].iova[j];
+
+			mask = (iova->pfn_hi - iova->pfn_lo + 1) << PAGE_SHIFT;
+			mask = ilog2(mask >> VTD_PAGE_SHIFT);
+			iommu_flush_dev_iotlb(deferred_flush[i].domain[j],
+					iova->pfn_lo << PAGE_SHIFT, mask);
+			__free_iova(&deferred_flush[i].domain[j]->iovad, iova);
 		}
 		deferred_flush[i].next = 0;
 	}
@@ -2794,6 +2885,7 @@ static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
 				info->dev->dev.archdata.iommu = NULL;
 			spin_unlock_irqrestore(&device_domain_lock, flags);
 
+			iommu_disable_dev_iotlb(info);
 			iommu_detach_dev(iommu, info->bus, info->devfn);
 			free_devinfo_mem(info);
 
@@ -2842,6 +2934,7 @@ static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)
 
 		spin_unlock_irqrestore(&device_domain_lock, flags1);
 
+		iommu_disable_dev_iotlb(info);
 		iommu = device_to_iommu(info->bus, info->devfn);
 		iommu_detach_dev(iommu, info->bus, info->devfn);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a32b3db..e182286 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -123,6 +123,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 #define ecap_qis(e)		((e) & 0x2)
 #define ecap_eim_support(e)	((e >> 4) & 0x1)
 #define ecap_ir_support(e)	((e >> 3) & 0x1)
+#define ecap_dev_iotlb_support(e)	(((e) >> 2) & 0x1)
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 
 
-- 
1.5.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] PCI: support the ATS capability
  2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
                   ` (5 preceding siblings ...)
  2009-03-23  7:59 ` [PATCH v4 6/6] VT-d: support the device IOTLB Yu Zhao
@ 2009-03-26 23:15 ` Jesse Barnes
  2009-03-29  5:21   ` Grant Grundler
  2009-03-29 13:51   ` Matthew Wilcox
  6 siblings, 2 replies; 12+ messages in thread
From: Jesse Barnes @ 2009-03-26 23:15 UTC (permalink / raw)
  To: Yu Zhao; +Cc: dwmw2, linux-pci, iommu, kvm, linux-kernel, Yu Zhao

On Mon, 23 Mar 2009 15:58:56 +0800
Yu Zhao <yu.zhao@intel.com> wrote:

> This patch series implements Address Translation Service support for
> the Intel IOMMU. The PCIe Endpoint that supports ATS capability can
> request the DMA address translation from the IOMMU and cache the
> translation itself. This can alleviate IOMMU TLB pressure and improve
> the hardware performance in the I/O virtualization environment.
> 
> The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The
> spec can be found at: http://www.pcisig.com/specifications/iov/ats/
> (it requires membership).
> 
> 
> Changelog:
> v3 -> v4
>   1, coding style fixes (Grant Grundler)
>   2, support the Virtual Function ATS capability
> 
> v2 -> v3
>   1, throw error message if VT-d hardware detects invalid descriptor
>      on Queued Invalidation interface (David Woodhouse)
>   2, avoid using pci_find_ext_capability every time when reading ATS
>      Invalidate Queue Depth (Matthew Wilcox)
> 
> v1 -> v2
>   added 'static' prefix to a local LIST_HEAD (Andrew Morton)

This is a good sized chunk of new code, and you want it to come through
the PCI tree, right?  It looks like it's seen some review from Grant,
David and Matthew but I don't see any Reviewed-by or Acked-by tags in
there... Anyone willing to provide those?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/6] VT-d: add device IOTLB invalidation support
  2009-03-23  7:59 ` [PATCH v4 4/6] VT-d: add device IOTLB invalidation support Yu Zhao
@ 2009-03-29  5:19   ` Grant Grundler
  0 siblings, 0 replies; 12+ messages in thread
From: Grant Grundler @ 2009-03-29  5:19 UTC (permalink / raw)
  To: Yu Zhao; +Cc: dwmw2, jbarnes, linux-pci, iommu, kvm, linux-kernel

On Mon, Mar 23, 2009 at 03:59:00PM +0800, Yu Zhao wrote:
> Support device IOTLB invalidation to flush the translation cached
> in the Endpoint.
> 
> Signed-off-by: Yu Zhao <yu.zhao@intel.com>
> ---
>  drivers/pci/dmar.c          |   77 ++++++++++++++++++++++++++++++++++++++----
>  include/linux/intel-iommu.h |   14 +++++++-
>  2 files changed, 82 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
> index 106bc45..494b167 100644
> --- a/drivers/pci/dmar.c
> +++ b/drivers/pci/dmar.c
> @@ -674,7 +674,8 @@ void free_iommu(struct intel_iommu *iommu)
>   */
>  static inline void reclaim_free_desc(struct q_inval *qi)
>  {
> -	while (qi->desc_status[qi->free_tail] == QI_DONE) {
> +	while (qi->desc_status[qi->free_tail] == QI_DONE ||
> +	       qi->desc_status[qi->free_tail] == QI_ABORT) {
>  		qi->desc_status[qi->free_tail] = QI_FREE;
>  		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
>  		qi->free_cnt++;
> @@ -684,10 +685,13 @@ static inline void reclaim_free_desc(struct q_inval *qi)
>  static int qi_check_fault(struct intel_iommu *iommu, int index)
>  {
>  	u32 fault;
> -	int head;
> +	int head, tail;
>  	struct q_inval *qi = iommu->qi;
>  	int wait_index = (index + 1) % QI_LENGTH;
>  
> +	if (qi->desc_status[wait_index] == QI_ABORT)
> +		return -EAGAIN;
> +
>  	fault = readl(iommu->reg + DMAR_FSTS_REG);
>  
>  	/*
> @@ -697,7 +701,11 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
>  	 */
>  	if (fault & DMA_FSTS_IQE) {
>  		head = readl(iommu->reg + DMAR_IQH_REG);
> -		if ((head >> 4) == index) {
> +		if ((head >> DMAR_IQ_OFFSET) == index) {

Yu,
DMAR_IQ_OFFSET should probably be called DMAR_IQ_SHIFT since it's used the
same way that "PAGE_SHIFT" is used.

I've looked through the rest of the code and don't see any problems.
But I also don't have a clue what "ITE" (in IOMMU context) is. I'm assuming
it has something to do with translation errors but have no idea about
where/when those are generated and what the outcome is.

thanks,
grant

> +			printk(KERN_ERR "VT-d detected invalid descriptor: "
> +				"low=%llx, high=%llx\n",
> +				(unsigned long long)qi->desc[index].low,
> +				(unsigned long long)qi->desc[index].high);
>  			memcpy(&qi->desc[index], &qi->desc[wait_index],
>  					sizeof(struct qi_desc));
>  			__iommu_flush_cache(iommu, &qi->desc[index],
> @@ -707,6 +715,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
>  		}
>  	}
>  
> +	/*
> +	 * If ITE happens, all pending wait_desc commands are aborted.
> +	 * No new descriptors are fetched until the ITE is cleared.
> +	 */
> +	if (fault & DMA_FSTS_ITE) {
> +		head = readl(iommu->reg + DMAR_IQH_REG);
> +		head = ((head >> DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
> +		head |= 1;
> +		tail = readl(iommu->reg + DMAR_IQT_REG);
> +		tail = ((tail >> DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
> +
> +		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
> +
> +		do {
> +			if (qi->desc_status[head] == QI_IN_USE)
> +				qi->desc_status[head] = QI_ABORT;
> +			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
> +		} while (head != tail);
> +
> +		if (qi->desc_status[wait_index] == QI_ABORT)
> +			return -EAGAIN;
> +	}
> +
> +	if (fault & DMA_FSTS_ICE)
> +		writel(DMA_FSTS_ICE, iommu->reg + DMAR_FSTS_REG);
> +
>  	return 0;
>  }
>  
> @@ -716,7 +750,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
>   */
>  int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  {
> -	int rc = 0;
> +	int rc;
>  	struct q_inval *qi = iommu->qi;
>  	struct qi_desc *hw, wait_desc;
>  	int wait_index, index;
> @@ -727,6 +761,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  
>  	hw = qi->desc;
>  
> +restart:
> +	rc = 0;
> +
>  	spin_lock_irqsave(&qi->q_lock, flags);
>  	while (qi->free_cnt < 3) {
>  		spin_unlock_irqrestore(&qi->q_lock, flags);
> @@ -757,7 +794,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	 * update the HW tail register indicating the presence of
>  	 * new descriptors.
>  	 */
> -	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
> +	writel(qi->free_head << DMAR_IQ_OFFSET, iommu->reg + DMAR_IQT_REG);
>  
>  	while (qi->desc_status[wait_index] != QI_DONE) {
>  		/*
> @@ -769,18 +806,21 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  		 */
>  		rc = qi_check_fault(iommu, index);
>  		if (rc)
> -			goto out;
> +			break;
>  
>  		spin_unlock(&qi->q_lock);
>  		cpu_relax();
>  		spin_lock(&qi->q_lock);
>  	}
> -out:
> -	qi->desc_status[index] = qi->desc_status[wait_index] = QI_DONE;
> +
> +	qi->desc_status[index] = QI_DONE;
>  
>  	reclaim_free_desc(qi);
>  	spin_unlock_irqrestore(&qi->q_lock, flags);
>  
> +	if (rc == -EAGAIN)
> +		goto restart;
> +
>  	return rc;
>  }
>  
> @@ -847,6 +887,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>  	return qi_submit_sync(&desc, iommu);
>  }
>  
> +int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> +			u64 addr, unsigned mask)
> +{
> +	struct qi_desc desc;
> +
> +	if (mask) {
> +		BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
> +		addr |= (1 << (VTD_PAGE_SHIFT + mask - 1)) - 1;
> +		desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
> +	} else
> +		desc.high = QI_DEV_IOTLB_ADDR(addr);
> +
> +	if (qdep >= QI_DEV_IOTLB_MAX_INVS)
> +		qdep = 0;
> +
> +	desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
> +		   QI_DIOTLB_TYPE;
> +
> +	return qi_submit_sync(&desc, iommu);
> +}
> +
>  /*
>   * Enable Queued Invalidation interface. This is a must to support
>   * interrupt-remapping. Also used by DMA-remapping, which replaces
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 660a7f4..a32b3db 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -53,6 +53,7 @@
>  #define	DMAR_PHMLIMIT_REG 0x78	/* pmrr high limit */
>  #define DMAR_IQH_REG	0x80	/* Invalidation queue head register */
>  #define DMAR_IQT_REG	0x88	/* Invalidation queue tail register */
> +#define DMAR_IQ_OFFSET	4	/* Invalidation queue head/tail offset */
>  #define DMAR_IQA_REG	0x90	/* Invalidation queue addr register */
>  #define DMAR_ICS_REG	0x98	/* Invalidation complete status register */
>  #define DMAR_IRTA_REG	0xb8    /* Interrupt remapping table addr register */
> @@ -195,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
>  #define DMA_FSTS_PPF ((u32)2)
>  #define DMA_FSTS_PFO ((u32)1)
>  #define DMA_FSTS_IQE (1 << 4)
> +#define DMA_FSTS_ICE (1 << 5)
> +#define DMA_FSTS_ITE (1 << 6)
>  #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
>  
>  /* FRCD_REG, 32 bits access */
> @@ -223,7 +226,8 @@ do {									\
>  enum {
>  	QI_FREE,
>  	QI_IN_USE,
> -	QI_DONE
> +	QI_DONE,
> +	QI_ABORT
>  };
>  
>  #define QI_CC_TYPE		0x1
> @@ -252,6 +256,12 @@ enum {
>  #define QI_CC_DID(did)		(((u64)did) << 16)
>  #define QI_CC_GRAN(gran)	(((u64)gran) >> (DMA_CCMD_INVL_GRANU_OFFSET-4))
>  
> +#define QI_DEV_IOTLB_SID(sid)	((u64)((sid) & 0xffff) << 32)
> +#define QI_DEV_IOTLB_QDEP(qdep)	(((qdep) & 0x1f) << 16)
> +#define QI_DEV_IOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
> +#define QI_DEV_IOTLB_SIZE	1
> +#define QI_DEV_IOTLB_MAX_INVS	32
> +
>  struct qi_desc {
>  	u64 low, high;
>  };
> @@ -329,6 +339,8 @@ extern int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
>  extern int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>  			  unsigned int size_order, u64 type,
>  			  int non_present_entry_flush);
> +extern int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> +			       u64 addr, unsigned mask);
>  
>  extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
>  
> -- 
> 1.5.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] PCI: support the ATS capability
  2009-03-26 23:15 ` [PATCH v4 0/6] PCI: support the ATS capability Jesse Barnes
@ 2009-03-29  5:21   ` Grant Grundler
  2009-03-29 13:51   ` Matthew Wilcox
  1 sibling, 0 replies; 12+ messages in thread
From: Grant Grundler @ 2009-03-29  5:21 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Yu Zhao, dwmw2, linux-pci, iommu, kvm, linux-kernel

On Thu, Mar 26, 2009 at 04:15:56PM -0700, Jesse Barnes wrote:
...
> This is a good sized chunk of new code, and you want it to come through
> the PCI tree, right?  It looks like it's seen some review from Grant,
> David and Matthew but I don't see any Reviewed-by or Acked-by tags in
> there... Anyone willing to provide those?

Sorry, I'm not. I've read through the code but don't understand the
many of the details about how this particular HW works. All I can
do is pick nits.

cheers,
grant

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] PCI: support the ATS capability
  2009-03-26 23:15 ` [PATCH v4 0/6] PCI: support the ATS capability Jesse Barnes
  2009-03-29  5:21   ` Grant Grundler
@ 2009-03-29 13:51   ` Matthew Wilcox
  2009-03-30  1:56     ` Yu Zhao
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2009-03-29 13:51 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Yu Zhao, dwmw2, linux-pci, iommu, kvm, linux-kernel

On Thu, Mar 26, 2009 at 04:15:56PM -0700, Jesse Barnes wrote:
> >   2, avoid using pci_find_ext_capability every time when reading ATS
> >      Invalidate Queue Depth (Matthew Wilcox)

I asked a question about how that was used, and got back a version which
changed how it was done.  I still don't have an answer to my question.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] PCI: support the ATS capability
  2009-03-29 13:51   ` Matthew Wilcox
@ 2009-03-30  1:56     ` Yu Zhao
  0 siblings, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2009-03-30  1:56 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jesse Barnes, dwmw2, linux-pci, iommu, kvm, linux-kernel

On Sun, Mar 29, 2009 at 09:51:31PM +0800, Matthew Wilcox wrote:
> On Thu, Mar 26, 2009 at 04:15:56PM -0700, Jesse Barnes wrote:
> > >   2, avoid using pci_find_ext_capability every time when reading ATS
> > >      Invalidate Queue Depth (Matthew Wilcox)
> 
> I asked a question about how that was used, and got back a version which
> changed how it was done.  I still don't have an answer to my question.

VT-d hardware is designed as that the Invalidate Queue Depth is used
every time when the software prepares the Invalidate Request descriptor.
This happens when the device IOMMU mapping changes (i.e. device driver
calls DMA map/unmap if the device is use by the host; or when a guest
is started/destroyed if the device is assigned to this guest).

Given the DMA map/unmap are used very frequently, I suppose the queue
depth should be cached somewhere. And it used to be cached in the VT-d
private data structure (before v3) because I'm not sure about how the
IOMMU hardware from other vendors use the queue depth.

After you commented the code, I checked the AMD/IBM/Sun IOMMU: AMD IOMMU
also uses the invalidate queue for every Invalidate Request descriptor;
IBM/Sun IOMMUs don't look like supporting the ATS. So it's reasonable to
cache the queue depth in the PCI subsystem since all IOMMUs that support
the ATS use the queue depth in the same way (very frequently), right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-03-30  1:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-23  7:58 [PATCH v4 0/6] PCI: support the ATS capability Yu Zhao
2009-03-23  7:58 ` [PATCH v4 1/6] " Yu Zhao
2009-03-23  7:58 ` [PATCH v4 2/6] PCI: handle Virtual Function ATS enabling Yu Zhao
2009-03-23  7:58 ` [PATCH v4 3/6] VT-d: parse ATSR in DMA Remapping Reporting Structure Yu Zhao
2009-03-23  7:59 ` [PATCH v4 4/6] VT-d: add device IOTLB invalidation support Yu Zhao
2009-03-29  5:19   ` Grant Grundler
2009-03-23  7:59 ` [PATCH v4 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps Yu Zhao
2009-03-23  7:59 ` [PATCH v4 6/6] VT-d: support the device IOTLB Yu Zhao
2009-03-26 23:15 ` [PATCH v4 0/6] PCI: support the ATS capability Jesse Barnes
2009-03-29  5:21   ` Grant Grundler
2009-03-29 13:51   ` Matthew Wilcox
2009-03-30  1:56     ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).