All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-05-28 18:40 Alex Williamson
  2013-05-28 18:40   ` Alex Williamson
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 18:40 UTC (permalink / raw)
  To: iommu, dwmw2, joro; +Cc: stephen, linux-pci, ddutile

This series tries to address:

https://bugzilla.kernel.org/show_bug.cgi?id=44881

Where pci_find_upstream_pcie_bridge() gets lost trying to find the
upstream PCIe-to-PCI bridge for a device because the bridge doesn't
expose a PCIe capability.  To do this, we add a iommu_pci_is_pcie_bridge
function which includes a quirk to look to the next upstream device
as a sanity check.  We can then replace pci_find_upstream_pcie_bridge
with a function that's a bit more generic and less tied to intel-iommu
eccentricities.

v2 uses the same logic as v1, but moves the search and match code to
IOMMU-core since PCI-core doesn't want it.  v1 has several reports
from users that this solves the problem they have in the above bz.
Thanks,

Alex

---

Alex Williamson (2):
      iommu: Quirked PCIe bridge test and search function
      intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge


 drivers/iommu/Kconfig               |    5 ++
 drivers/iommu/Makefile              |    1 
 drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
 drivers/iommu/intel_irq_remapping.c |   15 +++++--
 drivers/iommu/pci.c                 |   69 +++++++++++++++++++++++++++++++
 drivers/iommu/pci.h                 |   23 ++++++++++
 6 files changed, 157 insertions(+), 33 deletions(-)
 create mode 100644 drivers/iommu/pci.c

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-05-28 18:40   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 18:40 UTC (permalink / raw)
  To: iommu, dwmw2, joro; +Cc: stephen, linux-pci, ddutile

Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
and do not include a PCIe capability.  pci_is_pcie() is not useful on
these devices, making it difficult to determine where we transition
from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
bridge test for fear of confusion that a PCIe capability would be
expected, so implement it in IOMMU code.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/Kconfig  |    3 ++
 drivers/iommu/Makefile |    1 +
 drivers/iommu/pci.c    |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/pci.h    |   23 ++++++++++++++++
 4 files changed, 96 insertions(+)
 create mode 100644 drivers/iommu/pci.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c332fb9..a591794 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -2,6 +2,9 @@
 config IOMMU_API
 	bool
 
+config IOMMU_PCI
+	bool
+
 menuconfig IOMMU_SUPPORT
 	bool "IOMMU Hardware Support"
 	default y
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ef0e520..f14d905 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,3 +15,4 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
 obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
+obj-$(CONFIG_IOMMU_PCI) += pci.o
diff --git a/drivers/iommu/pci.c b/drivers/iommu/pci.c
new file mode 100644
index 0000000..c3cae8d
--- /dev/null
+++ b/drivers/iommu/pci.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2013 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/pci.h>
+
+bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev)
+{
+	if (!pdev->subordinate)
+		return false;
+
+	if (pci_is_pcie(pdev))
+		return true;
+
+#ifdef CONFIG_PCI_QUIRKS
+	/*
+	 * If we're not on the root bus, look one device upstream of the
+	 * current device.  If that device is PCIe and is not a PCIe-to-PCI
+	 * bridge, then the current device is effectively PCIe as it must
+	 * be the PCIe-to-PCI bridge.  This handles several bridges that
+	 * violate the PCIe spec by not exposing a PCIe capability:
+	 * https://bugzilla.kernel.org/show_bug.cgi?id=44881
+	 */
+	if (!pci_is_root_bus(pdev->bus)) {
+		struct pci_dev *parent = pdev->bus->self;
+
+		if (pci_is_pcie(parent) &&
+		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
+			return true;
+	}
+#endif
+	return false;
+}
+
+struct pci_dev *iommu_pci_find_upstream(struct pci_dev *pdev,
+					bool (*match)(struct pci_dev *),
+					struct pci_dev **last)
+{
+	*last = NULL;
+
+	if (match(pdev))
+		return pdev;
+
+	*last = pdev;
+
+	while (!pci_is_root_bus(pdev->bus)) {
+		*last = pdev;
+		pdev = pdev->bus->self;
+
+		if (match(pdev))
+			return pdev;
+	}
+
+	return NULL;
+}
diff --git a/drivers/iommu/pci.h b/drivers/iommu/pci.h
index 352d80a..cc6d58a 100644
--- a/drivers/iommu/pci.h
+++ b/drivers/iommu/pci.h
@@ -26,4 +26,27 @@ static inline void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
 	*from = to;
 }
 
+/**
+ * iommu_pci_is_pcie_bridge - Match a PCIe bridge device
+ * @pdev: device to test
+ *
+ * Return true if the given device is a PCIe bridge, false otherwise.
+ */
+bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev);
+
+/**
+ * iommu_pci_find_upstream - Generic upstream search function
+ * @pdev: starting PCI device to search
+ * @match: match function to call on each device (true = match)
+ * @last: last device examined prior to returned device
+ *
+ * Walk upstream from the given device, calling match() at each
+ * device, including self.  Returns the first device matching match().
+ * If the root bus is reached without finding a match, return NULL.
+ * last returns the N-1 step in the search path.
+ */
+struct pci_dev *iommu_pci_find_upstream(struct pci_dev *pdev,
+					bool (*match)(struct pci_dev *),
+					struct pci_dev **last);
+
 #endif  /* __IOMMU_PCI_H */


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-05-28 18:40   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 18:40 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA

Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
and do not include a PCIe capability.  pci_is_pcie() is not useful on
these devices, making it difficult to determine where we transition
from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
bridge test for fear of confusion that a PCIe capability would be
expected, so implement it in IOMMU code.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/Kconfig  |    3 ++
 drivers/iommu/Makefile |    1 +
 drivers/iommu/pci.c    |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/pci.h    |   23 ++++++++++++++++
 4 files changed, 96 insertions(+)
 create mode 100644 drivers/iommu/pci.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c332fb9..a591794 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -2,6 +2,9 @@
 config IOMMU_API
 	bool
 
+config IOMMU_PCI
+	bool
+
 menuconfig IOMMU_SUPPORT
 	bool "IOMMU Hardware Support"
 	default y
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ef0e520..f14d905 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,3 +15,4 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
 obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
+obj-$(CONFIG_IOMMU_PCI) += pci.o
diff --git a/drivers/iommu/pci.c b/drivers/iommu/pci.c
new file mode 100644
index 0000000..c3cae8d
--- /dev/null
+++ b/drivers/iommu/pci.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2013 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/pci.h>
+
+bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev)
+{
+	if (!pdev->subordinate)
+		return false;
+
+	if (pci_is_pcie(pdev))
+		return true;
+
+#ifdef CONFIG_PCI_QUIRKS
+	/*
+	 * If we're not on the root bus, look one device upstream of the
+	 * current device.  If that device is PCIe and is not a PCIe-to-PCI
+	 * bridge, then the current device is effectively PCIe as it must
+	 * be the PCIe-to-PCI bridge.  This handles several bridges that
+	 * violate the PCIe spec by not exposing a PCIe capability:
+	 * https://bugzilla.kernel.org/show_bug.cgi?id=44881
+	 */
+	if (!pci_is_root_bus(pdev->bus)) {
+		struct pci_dev *parent = pdev->bus->self;
+
+		if (pci_is_pcie(parent) &&
+		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
+			return true;
+	}
+#endif
+	return false;
+}
+
+struct pci_dev *iommu_pci_find_upstream(struct pci_dev *pdev,
+					bool (*match)(struct pci_dev *),
+					struct pci_dev **last)
+{
+	*last = NULL;
+
+	if (match(pdev))
+		return pdev;
+
+	*last = pdev;
+
+	while (!pci_is_root_bus(pdev->bus)) {
+		*last = pdev;
+		pdev = pdev->bus->self;
+
+		if (match(pdev))
+			return pdev;
+	}
+
+	return NULL;
+}
diff --git a/drivers/iommu/pci.h b/drivers/iommu/pci.h
index 352d80a..cc6d58a 100644
--- a/drivers/iommu/pci.h
+++ b/drivers/iommu/pci.h
@@ -26,4 +26,27 @@ static inline void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
 	*from = to;
 }
 
+/**
+ * iommu_pci_is_pcie_bridge - Match a PCIe bridge device
+ * @pdev: device to test
+ *
+ * Return true if the given device is a PCIe bridge, false otherwise.
+ */
+bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev);
+
+/**
+ * iommu_pci_find_upstream - Generic upstream search function
+ * @pdev: starting PCI device to search
+ * @match: match function to call on each device (true = match)
+ * @last: last device examined prior to returned device
+ *
+ * Walk upstream from the given device, calling match() at each
+ * device, including self.  Returns the first device matching match().
+ * If the root bus is reached without finding a match, return NULL.
+ * last returns the N-1 step in the search path.
+ */
+struct pci_dev *iommu_pci_find_upstream(struct pci_dev *pdev,
+					bool (*match)(struct pci_dev *),
+					struct pci_dev **last);
+
 #endif  /* __IOMMU_PCI_H */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/2] intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge
  2013-05-28 18:40 [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges Alex Williamson
  2013-05-28 18:40   ` Alex Williamson
@ 2013-05-28 18:40 ` Alex Williamson
  2013-05-28 22:09   ` Bjorn Helgaas
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 18:40 UTC (permalink / raw)
  To: iommu, dwmw2, joro; +Cc: stephen, linux-pci, ddutile

pci_find_upstream_pcie_bridge() attempts to return the PCIe-to-PCI
bridge upstream from the provided device.  It currently returns NULL
if 1) the provided device is PCIe (ie. it does not have such an
upstream bridge), 2) the device is on the root bus (a subtle case of
its claim to return the parent device when not connected to a PCIe
bridge), or 3) when it gets lost walking the bus and finds itself on
a PCIe device that is not a PCIe-to-PCI bridge.  This latter case not
only generates a WARN, but treats the provided device the same as if
it were PCIe.

We can replace 1) and 2) with explicit tests in the IOMMU code as this
is the code that knows the visibility of devices at the root bus.
That leaves the bus walk case 3), where the NULL return is actually an
error.

We hit this error because some PCIe-to-PCI bridge vendors ignore the
PCIe spec requirement that all PCIe devices must include a PCIe
capability, which is all that pci_is_pcie() actually tests.  When
PCI quirks are enabled, iommu_pci_is_pcie_bridge() adds one more test
by looking at the next upstream device to test whether it is PCIe with
a type code of a PCIe-to-PCI/PCI-X bridge.  If it does not have this
type code, then by deduction, the current bridge must be a PCIe-to-PCI
bridge and thus be a PCIe bridge.

Therefore, pci_find_upstream_pcie_bridge() is replaced by an
explicit call to (pci_is_pcie() || pci_is_root_bus()) plus
iommu_pci_find_upstream() with the iommu_pci_is_pcie_bridge() match
function.  This either returns the first PCIe bridge, which must
be a PCIe-to-PCI bridge or NULL if not found.  In the NULL case, we
know the original device is not on the root bus (already tested for
that), so we can safely walk from the provided device to the last
device and fall into the legacy PCI bridge code.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=44881

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/Kconfig               |    2 +
 drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
 drivers/iommu/intel_irq_remapping.c |   15 +++++--
 3 files changed, 61 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index a591794..71ec2d5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,7 @@ config INTEL_IOMMU
 	depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC)
 	select IOMMU_API
 	select DMAR_TABLE
+	select IOMMU_PCI
 	help
 	  DMA remapping (DMAR) devices support enables independent address
 	  translations for Direct Memory Access (DMA) from devices.
@@ -125,6 +126,7 @@ config IRQ_REMAP
 	bool "Support for Interrupt Remapping"
 	depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
 	select DMAR_TABLE
+	select IOMMU_PCI
 	---help---
 	  Supports Interrupt remapping for IO-APIC and MSI devices.
 	  To use x2apic mode in the CPU's which support x2APIC enhancements or
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b4f0e28..3748187 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1689,9 +1689,13 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
 		return ret;
 
 	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (!tmp)
+	if (pci_is_pcie(pdev) || pci_is_root_bus(pdev->bus))
 		return 0;
+
+	tmp = iommu_pci_find_upstream(pdev, iommu_pci_is_pcie_bridge, &parent);
+	if (!tmp)
+		tmp = parent;
+
 	/* Secondary interface's bus number and devfn 0 */
 	parent = pdev->bus->self;
 	while (parent != tmp) {
@@ -1703,7 +1707,7 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
 			return ret;
 		parent = parent->bus->self;
 	}
-	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
+	if (iommu_pci_is_pcie_bridge(tmp)) /* this is a PCIe-to-PCI bridge */
 		return domain_context_mapping_one(domain,
 					pci_domain_nr(tmp->subordinate),
 					tmp->subordinate->number, 0,
@@ -1731,9 +1735,13 @@ static int domain_context_mapped(struct pci_dev *pdev)
 	if (!ret)
 		return ret;
 	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
+	if (pci_is_pcie(pdev) || pci_is_root_bus(pdev->bus))
+		return 0;
+
+	tmp = iommu_pci_find_upstream(pdev, iommu_pci_is_pcie_bridge, &parent);
 	if (!tmp)
-		return ret;
+		tmp = parent;
+
 	/* Secondary interface's bus number and devfn 0 */
 	parent = pdev->bus->self;
 	while (parent != tmp) {
@@ -1743,7 +1751,7 @@ static int domain_context_mapped(struct pci_dev *pdev)
 			return ret;
 		parent = parent->bus->self;
 	}
-	if (pci_is_pcie(tmp))
+	if (iommu_pci_is_pcie_bridge(tmp))
 		return device_context_mapped(iommu, tmp->subordinate->number,
 					     0);
 	else
@@ -1974,7 +1982,7 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	struct intel_iommu *iommu;
 	struct dmar_drhd_unit *drhd;
 	struct device_domain_info *info, *tmp;
-	struct pci_dev *dev_tmp;
+	struct pci_dev *dev_tmp = NULL;
 	unsigned long flags;
 	int bus = 0, devfn = 0;
 	int segment;
@@ -1986,9 +1994,16 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 
 	segment = pci_domain_nr(pdev->bus);
 
-	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (dev_tmp) {
-		if (pci_is_pcie(dev_tmp)) {
+	if (!pci_is_pcie(pdev) && !pci_is_root_bus(pdev->bus)) {
+		struct pci_dev *last;
+
+		dev_tmp = iommu_pci_find_upstream(pdev,
+						  iommu_pci_is_pcie_bridge,
+						  &last);
+		if (!dev_tmp)
+			dev_tmp = last;
+
+		if (iommu_pci_is_pcie_bridge(dev_tmp)) {
 			bus = dev_tmp->subordinate->number;
 			devfn = 0;
 		} else {
@@ -3754,26 +3769,24 @@ static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
 {
 	struct pci_dev *tmp, *parent;
 
-	if (!iommu || !pdev)
+	if (!iommu || !pdev || pci_is_pcie(pdev) || pci_is_root_bus(pdev->bus))
 		return;
 
 	/* dependent device detach */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
+	tmp = iommu_pci_find_upstream(pdev, iommu_pci_is_pcie_bridge, &parent);
+	if (!tmp)
+		tmp = parent;
+
 	/* Secondary interface's bus number and devfn 0 */
-	if (tmp) {
-		parent = pdev->bus->self;
-		while (parent != tmp) {
-			iommu_detach_dev(iommu, parent->bus->number,
-					 parent->devfn);
-			parent = parent->bus->self;
-		}
-		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-			iommu_detach_dev(iommu,
-				tmp->subordinate->number, 0);
-		else /* this is a legacy PCI bridge */
-			iommu_detach_dev(iommu, tmp->bus->number,
-					 tmp->devfn);
+	parent = pdev->bus->self;
+	while (parent != tmp) {
+		iommu_detach_dev(iommu, parent->bus->number, parent->devfn);
+		parent = parent->bus->self;
 	}
+	if (iommu_pci_is_pcie_bridge(tmp)) /* this is a PCIe-to-PCI bridge */
+		iommu_detach_dev(iommu, tmp->subordinate->number, 0);
+	else /* this is a legacy PCI bridge */
+		iommu_detach_dev(iommu, tmp->bus->number, tmp->devfn);
 }
 
 static void domain_remove_one_dev_info(struct dmar_domain *domain,
@@ -4158,7 +4171,7 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
 static int intel_iommu_add_device(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-	struct pci_dev *bridge, *dma_pdev = NULL;
+	struct pci_dev *dma_pdev = NULL;
 	struct iommu_group *group;
 	int ret;
 
@@ -4166,9 +4179,15 @@ static int intel_iommu_add_device(struct device *dev)
 			     pdev->bus->number, pdev->devfn))
 		return -ENODEV;
 
-	bridge = pci_find_upstream_pcie_bridge(pdev);
-	if (bridge) {
-		if (pci_is_pcie(bridge))
+	if (!pci_is_pcie(pdev) && !pci_is_root_bus(pdev->bus)) {
+		struct pci_dev *bridge, *last;
+
+		bridge = iommu_pci_find_upstream(pdev,iommu_pci_is_pcie_bridge,
+						 &last);
+		if (!bridge)
+			bridge = last;
+
+		if (iommu_pci_is_pcie_bridge(bridge))
 			dma_pdev = pci_get_domain_bus_and_slot(
 						pci_domain_nr(pdev->bus),
 						bridge->subordinate->number, 0);
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 5b19b2d..2a6d27f 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -16,6 +16,7 @@
 #include <asm/msidef.h>
 
 #include "irq_remapping.h"
+#include "pci.h"
 
 struct ioapic_scope {
 	struct intel_iommu *iommu;
@@ -373,7 +374,6 @@ static int set_hpet_sid(struct irte *irte, u8 id)
 
 static int set_msi_sid(struct irte *irte, struct pci_dev *dev)
 {
-	struct pci_dev *bridge;
 
 	if (!irte || !dev)
 		return -1;
@@ -385,9 +385,16 @@ static int set_msi_sid(struct irte *irte, struct pci_dev *dev)
 		return 0;
 	}
 
-	bridge = pci_find_upstream_pcie_bridge(dev);
-	if (bridge) {
-		if (pci_is_pcie(bridge))/* this is a PCIe-to-PCI/PCIX bridge */
+	if (!pci_is_pcie(dev) && !pci_is_root_bus(dev->bus)) {
+		struct pci_dev *bridge, *last;
+
+		bridge = iommu_pci_find_upstream(dev, iommu_pci_is_pcie_bridge,
+						 &last);
+		if (!bridge)
+			bridge = last;
+
+		/* PCIe-to-PCI/PCIX bridge */
+		if (iommu_pci_is_pcie_bridge(bridge))
 			set_irte_sid(irte, SVT_VERIFY_BUS, SQ_ALL_16,
 				(bridge->bus->number << 8) | dev->bus->number);
 		else /* this is a legacy PCI bridge */


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-05-28 18:40   ` Alex Williamson
  (?)
@ 2013-05-28 19:38   ` Stephen Hemminger
  2013-05-28 19:53     ` Alex Williamson
  -1 siblings, 1 reply; 34+ messages in thread
From: Stephen Hemminger @ 2013-05-28 19:38 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joro, linux-pci, ddutile

On Tue, 28 May 2013 12:40:20 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
> and do not include a PCIe capability.  pci_is_pcie() is not useful on
> these devices, making it difficult to determine where we transition
> from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
> bridge test for fear of confusion that a PCIe capability would be
> expected, so implement it in IOMMU code.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

This won't work for me because I run without IOMMU.
The overhead of IOMMU setup per-packet significantly impacts network performance.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-05-28 19:38   ` Stephen Hemminger
@ 2013-05-28 19:53     ` Alex Williamson
  2013-05-28 19:56       ` Stephen Hemminger
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 19:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: iommu, dwmw2, joro, linux-pci, ddutile

On Tue, 2013-05-28 at 12:38 -0700, Stephen Hemminger wrote:
> On Tue, 28 May 2013 12:40:20 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
> > and do not include a PCIe capability.  pci_is_pcie() is not useful on
> > these devices, making it difficult to determine where we transition
> > from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
> > bridge test for fear of confusion that a PCIe capability would be
> > expected, so implement it in IOMMU code.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> This won't work for me because I run without IOMMU.
> The overhead of IOMMU setup per-packet significantly impacts network performance.

You were one of the reporters of this issue:

http://www.spinics.net/lists/linux-pci/msg19579.html

Does this not work for you or are you saying this is not relevant to you
because you no longer attempt to enable the IOMMU?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-05-28 19:53     ` Alex Williamson
@ 2013-05-28 19:56       ` Stephen Hemminger
  2013-05-28 20:15         ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Hemminger @ 2013-05-28 19:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joro, linux-pci, ddutile

On Tue, 28 May 2013 13:53:10 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 2013-05-28 at 12:38 -0700, Stephen Hemminger wrote:
> > On Tue, 28 May 2013 12:40:20 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> > > Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
> > > and do not include a PCIe capability.  pci_is_pcie() is not useful on
> > > these devices, making it difficult to determine where we transition
> > > from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
> > > bridge test for fear of confusion that a PCIe capability would be
> > > expected, so implement it in IOMMU code.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > This won't work for me because I run without IOMMU.
> > The overhead of IOMMU setup per-packet significantly impacts network performance.
> 
> You were one of the reporters of this issue:
> 
> http://www.spinics.net/lists/linux-pci/msg19579.html
> 
> Does this not work for you or are you saying this is not relevant to you
> because you no longer attempt to enable the IOMMU?  Thanks,
> 
> Alex
> 

For most usage, I gave up on using IOMMU/SR-IOV.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-05-28 19:56       ` Stephen Hemminger
@ 2013-05-28 20:15         ` Alex Williamson
  2013-05-28 20:28           ` Stephen Hemminger
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 20:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: iommu, dwmw2, joro, linux-pci, ddutile

On Tue, 2013-05-28 at 12:56 -0700, Stephen Hemminger wrote:
> On Tue, 28 May 2013 13:53:10 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 2013-05-28 at 12:38 -0700, Stephen Hemminger wrote:
> > > On Tue, 28 May 2013 12:40:20 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > 
> > > > Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
> > > > and do not include a PCIe capability.  pci_is_pcie() is not useful on
> > > > these devices, making it difficult to determine where we transition
> > > > from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
> > > > bridge test for fear of confusion that a PCIe capability would be
> > > > expected, so implement it in IOMMU code.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > > This won't work for me because I run without IOMMU.
> > > The overhead of IOMMU setup per-packet significantly impacts network performance.
> > 
> > You were one of the reporters of this issue:
> > 
> > http://www.spinics.net/lists/linux-pci/msg19579.html
> > 
> > Does this not work for you or are you saying this is not relevant to you
> > because you no longer attempt to enable the IOMMU?  Thanks,
> > 
> 
> For most usage, I gave up on using IOMMU/SR-IOV.

Ok, let's call that a "don't care" rather than "won't work" for the
purposes of these patches.  I'll take you off the CC if there's another
version.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-05-28 20:15         ` Alex Williamson
@ 2013-05-28 20:28           ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2013-05-28 20:28 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joro, linux-pci, ddutile

On Tue, 28 May 2013 14:15:24 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 2013-05-28 at 12:56 -0700, Stephen Hemminger wrote:
> > On Tue, 28 May 2013 13:53:10 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> > > On Tue, 2013-05-28 at 12:38 -0700, Stephen Hemminger wrote:
> > > > On Tue, 28 May 2013 12:40:20 -0600
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > 
> > > > > Some PCIe-to-PCI bridges are not fully compliant with the PCIe spec
> > > > > and do not include a PCIe capability.  pci_is_pcie() is not useful on
> > > > > these devices, making it difficult to determine where we transition
> > > > > from a legacy PCI bus to a PCIe link.  PCI-core doesn't want a quirked
> > > > > bridge test for fear of confusion that a PCIe capability would be
> > > > > expected, so implement it in IOMMU code.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > This won't work for me because I run without IOMMU.
> > > > The overhead of IOMMU setup per-packet significantly impacts network performance.
> > > 
> > > You were one of the reporters of this issue:
> > > 
> > > http://www.spinics.net/lists/linux-pci/msg19579.html
> > > 
> > > Does this not work for you or are you saying this is not relevant to you
> > > because you no longer attempt to enable the IOMMU?  Thanks,
> > > 
> > 
> > For most usage, I gave up on using IOMMU/SR-IOV.
> 
> Ok, let's call that a "don't care" rather than "won't work" for the
> purposes of these patches.  I'll take you off the CC if there's another
> version.  Thanks,
> 
> Alex
> 

Yes, it is a don't care. but leave me on cc, because I may still need it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-05-28 22:09   ` Bjorn Helgaas
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-05-28 22:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: open list:INTEL IOMMU (VT-d),
	David Woodhouse, Joerg Roedel, Stephen Hemminger, linux-pci,
	Don Dutile

On Tue, May 28, 2013 at 12:40 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> This series tries to address:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44881
>
> Where pci_find_upstream_pcie_bridge() gets lost trying to find the
> upstream PCIe-to-PCI bridge for a device because the bridge doesn't
> expose a PCIe capability.  To do this, we add a iommu_pci_is_pcie_bridge
> function which includes a quirk to look to the next upstream device
> as a sanity check.  We can then replace pci_find_upstream_pcie_bridge
> with a function that's a bit more generic and less tied to intel-iommu
> eccentricities.
>
> v2 uses the same logic as v1, but moves the search and match code to
> IOMMU-core since PCI-core doesn't want it.  v1 has several reports
> from users that this solves the problem they have in the above bz.
> Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (2):
>       iommu: Quirked PCIe bridge test and search function
>       intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge
>
>
>  drivers/iommu/Kconfig               |    5 ++
>  drivers/iommu/Makefile              |    1
>  drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
>  drivers/iommu/intel_irq_remapping.c |   15 +++++--
>  drivers/iommu/pci.c                 |   69 +++++++++++++++++++++++++++++++
>  drivers/iommu/pci.h                 |   23 ++++++++++
>  6 files changed, 157 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/iommu/pci.c

These both look OK to me, for whatever that's worth (since they don't
touch drivers/pci any more :)).

I assume you probably removed the only users of
pci_find_upstream_pcie_bridge(), right?  If so, and you want to remove
that in this same series, you can add my Acked-by to that.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-05-28 22:09   ` Bjorn Helgaas
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-05-28 22:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Stephen Hemminger,
	open list:INTEL IOMMU (VT-d),
	David Woodhouse

On Tue, May 28, 2013 at 12:40 PM, Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> This series tries to address:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44881
>
> Where pci_find_upstream_pcie_bridge() gets lost trying to find the
> upstream PCIe-to-PCI bridge for a device because the bridge doesn't
> expose a PCIe capability.  To do this, we add a iommu_pci_is_pcie_bridge
> function which includes a quirk to look to the next upstream device
> as a sanity check.  We can then replace pci_find_upstream_pcie_bridge
> with a function that's a bit more generic and less tied to intel-iommu
> eccentricities.
>
> v2 uses the same logic as v1, but moves the search and match code to
> IOMMU-core since PCI-core doesn't want it.  v1 has several reports
> from users that this solves the problem they have in the above bz.
> Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (2):
>       iommu: Quirked PCIe bridge test and search function
>       intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge
>
>
>  drivers/iommu/Kconfig               |    5 ++
>  drivers/iommu/Makefile              |    1
>  drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
>  drivers/iommu/intel_irq_remapping.c |   15 +++++--
>  drivers/iommu/pci.c                 |   69 +++++++++++++++++++++++++++++++
>  drivers/iommu/pci.h                 |   23 ++++++++++
>  6 files changed, 157 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/iommu/pci.c

These both look OK to me, for whatever that's worth (since they don't
touch drivers/pci any more :)).

I assume you probably removed the only users of
pci_find_upstream_pcie_bridge(), right?  If so, and you want to remove
that in this same series, you can add my Acked-by to that.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 3/2] pci: Remove unused pci_find_upstream_pcie_bridge()
@ 2013-05-28 22:53   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 22:53 UTC (permalink / raw)
  To: bhelgaas, iommu, dwmw2, joro; +Cc: stephen, linux-pci, ddutile

This no longer has any users.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---

I figured I'd follow up with this after the series is accepted, but I
won't turn down an pre-emptive ack from Bjorn.

 drivers/pci/search.c |   35 -----------------------------------
 include/linux/pci.h  |   11 -----------
 2 files changed, 46 deletions(-)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..da2e82e 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -17,41 +17,6 @@
 DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
-/*
- * find the upstream PCIe-to-PCI bridge of a PCI device
- * if the device is PCIE, return NULL
- * if the device isn't connected to a PCIe bridge (that is its parent is a
- * legacy PCI bridge and the bridge is directly connected to bus 0), return its
- * parent
- */
-struct pci_dev *
-pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
-{
-	struct pci_dev *tmp = NULL;
-
-	if (pci_is_pcie(pdev))
-		return NULL;
-	while (1) {
-		if (pci_is_root_bus(pdev->bus))
-			break;
-		pdev = pdev->bus->self;
-		/* a p2p bridge */
-		if (!pci_is_pcie(pdev)) {
-			tmp = pdev;
-			continue;
-		}
-		/* PCI device should connect to a PCIe bridge */
-		if (pci_pcie_type(pdev) != PCI_EXP_TYPE_PCI_BRIDGE) {
-			/* Busted hardware? */
-			WARN_ON_ONCE(1);
-			return NULL;
-		}
-		return pdev;
-	}
-
-	return tmp;
-}
-
 static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char busnr)
 {
 	struct pci_bus* child;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 710067f..890d426 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1851,15 +1851,4 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 }
 #endif
 
-/**
- * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
- * @pdev: the PCI device
- *
- * if the device is PCIE, return NULL
- * if the device isn't connected to a PCIe bridge (that is its parent is a
- * legacy PCI bridge and the bridge is directly connected to bus 0), return its
- * parent
- */
-struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
-
 #endif /* LINUX_PCI_H */


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 3/2] pci: Remove unused pci_find_upstream_pcie_bridge()
@ 2013-05-28 22:53   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-05-28 22:53 UTC (permalink / raw)
  To: bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA

This no longer has any users.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---

I figured I'd follow up with this after the series is accepted, but I
won't turn down an pre-emptive ack from Bjorn.

 drivers/pci/search.c |   35 -----------------------------------
 include/linux/pci.h  |   11 -----------
 2 files changed, 46 deletions(-)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..da2e82e 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -17,41 +17,6 @@
 DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
-/*
- * find the upstream PCIe-to-PCI bridge of a PCI device
- * if the device is PCIE, return NULL
- * if the device isn't connected to a PCIe bridge (that is its parent is a
- * legacy PCI bridge and the bridge is directly connected to bus 0), return its
- * parent
- */
-struct pci_dev *
-pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
-{
-	struct pci_dev *tmp = NULL;
-
-	if (pci_is_pcie(pdev))
-		return NULL;
-	while (1) {
-		if (pci_is_root_bus(pdev->bus))
-			break;
-		pdev = pdev->bus->self;
-		/* a p2p bridge */
-		if (!pci_is_pcie(pdev)) {
-			tmp = pdev;
-			continue;
-		}
-		/* PCI device should connect to a PCIe bridge */
-		if (pci_pcie_type(pdev) != PCI_EXP_TYPE_PCI_BRIDGE) {
-			/* Busted hardware? */
-			WARN_ON_ONCE(1);
-			return NULL;
-		}
-		return pdev;
-	}
-
-	return tmp;
-}
-
 static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char busnr)
 {
 	struct pci_bus* child;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 710067f..890d426 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1851,15 +1851,4 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 }
 #endif
 
-/**
- * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
- * @pdev: the PCI device
- *
- * if the device is PCIE, return NULL
- * if the device isn't connected to a PCIe bridge (that is its parent is a
- * legacy PCI bridge and the bridge is directly connected to bus 0), return its
- * parent
- */
-struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
-
 #endif /* LINUX_PCI_H */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-20 13:59     ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-06-20 13:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, stephen, linux-pci, ddutile

On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> +bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev)
> +{
> +	if (!pdev->subordinate)
> +		return false;
> +
> +	if (pci_is_pcie(pdev))
> +		return true;
> +
> +#ifdef CONFIG_PCI_QUIRKS
> +	/*
> +	 * If we're not on the root bus, look one device upstream of the
> +	 * current device.  If that device is PCIe and is not a PCIe-to-PCI
> +	 * bridge, then the current device is effectively PCIe as it must
> +	 * be the PCIe-to-PCI bridge.  This handles several bridges that
> +	 * violate the PCIe spec by not exposing a PCIe capability:
> +	 * https://bugzilla.kernel.org/show_bug.cgi?id=44881
> +	 */
> +	if (!pci_is_root_bus(pdev->bus)) {
> +		struct pci_dev *parent = pdev->bus->self;
> +
> +		if (pci_is_pcie(parent) &&
> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> +			return true;
> +	}

Hmm, that looks a bit dangerous. Do we have a list of PCI
vendor/device-ids of these broken bridges to match against instead of
some open-coded heuristics?

That would probably also help to bring this into the PCI code.


	Joerg



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-20 13:59     ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-06-20 13:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> +bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev)
> +{
> +	if (!pdev->subordinate)
> +		return false;
> +
> +	if (pci_is_pcie(pdev))
> +		return true;
> +
> +#ifdef CONFIG_PCI_QUIRKS
> +	/*
> +	 * If we're not on the root bus, look one device upstream of the
> +	 * current device.  If that device is PCIe and is not a PCIe-to-PCI
> +	 * bridge, then the current device is effectively PCIe as it must
> +	 * be the PCIe-to-PCI bridge.  This handles several bridges that
> +	 * violate the PCIe spec by not exposing a PCIe capability:
> +	 * https://bugzilla.kernel.org/show_bug.cgi?id=44881
> +	 */
> +	if (!pci_is_root_bus(pdev->bus)) {
> +		struct pci_dev *parent = pdev->bus->self;
> +
> +		if (pci_is_pcie(parent) &&
> +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> +			return true;
> +	}

Hmm, that looks a bit dangerous. Do we have a list of PCI
vendor/device-ids of these broken bridges to match against instead of
some open-coded heuristics?

That would probably also help to bring this into the PCI code.


	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-06-20 13:59     ` Joerg Roedel
  (?)
@ 2013-06-20 15:44     ` Alex Williamson
  2013-06-20 16:15         ` Joerg Roedel
  -1 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2013-06-20 15:44 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, dwmw2, stephen, linux-pci, ddutile, Bjorn Helgaas

On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
> On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> > +bool iommu_pci_is_pcie_bridge(struct pci_dev *pdev)
> > +{
> > +	if (!pdev->subordinate)
> > +		return false;
> > +
> > +	if (pci_is_pcie(pdev))
> > +		return true;
> > +
> > +#ifdef CONFIG_PCI_QUIRKS
> > +	/*
> > +	 * If we're not on the root bus, look one device upstream of the
> > +	 * current device.  If that device is PCIe and is not a PCIe-to-PCI
> > +	 * bridge, then the current device is effectively PCIe as it must
> > +	 * be the PCIe-to-PCI bridge.  This handles several bridges that
> > +	 * violate the PCIe spec by not exposing a PCIe capability:
> > +	 * https://bugzilla.kernel.org/show_bug.cgi?id=44881
> > +	 */
> > +	if (!pci_is_root_bus(pdev->bus)) {
> > +		struct pci_dev *parent = pdev->bus->self;
> > +
> > +		if (pci_is_pcie(parent) &&
> > +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> > +			return true;
> > +	}
> 
> Hmm, that looks a bit dangerous.

How so?  The algorithm seems pretty simple and logical.

>  Do we have a list of PCI
> vendor/device-ids of these broken bridges to match against instead of
> some open-coded heuristics?
> 
> That would probably also help to bring this into the PCI code.

Actually, I believe Bjorn rejected the idea of a fixed list because this
problem is detectable.  He also doesn't want me messing with quirks to
pci_is_pcie() in PCI because he wants a 1:1 relation between that and
having a PCIe capability.  So, I'm stuck and this is where it's ended
up.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-20 16:15         ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-06-20 16:15 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, stephen, linux-pci, ddutile, Bjorn Helgaas

On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> > > +	if (!pci_is_root_bus(pdev->bus)) {
> > > +		struct pci_dev *parent = pdev->bus->self;
> > > +
> > > +		if (pci_is_pcie(parent) &&
> > > +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> > > +			return true;
> > > +	}
> > 
> > Hmm, that looks a bit dangerous.
> 
> How so?  The algorithm seems pretty simple and logical.

It is simple, but it is still a heuristic that may fail at some point,
no?

> Actually, I believe Bjorn rejected the idea of a fixed list because this
> problem is detectable.  He also doesn't want me messing with quirks to
> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
> having a PCIe capability.  So, I'm stuck and this is where it's ended
> up.  Thanks,

I think implementing such a list is much safer.

Bjorn, why didn't you like that idea?


	Joerg



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-20 16:15         ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-06-20 16:15 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Bjorn Helgaas,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> > > +	if (!pci_is_root_bus(pdev->bus)) {
> > > +		struct pci_dev *parent = pdev->bus->self;
> > > +
> > > +		if (pci_is_pcie(parent) &&
> > > +		    pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> > > +			return true;
> > > +	}
> > 
> > Hmm, that looks a bit dangerous.
> 
> How so?  The algorithm seems pretty simple and logical.

It is simple, but it is still a heuristic that may fail at some point,
no?

> Actually, I believe Bjorn rejected the idea of a fixed list because this
> problem is detectable.  He also doesn't want me messing with quirks to
> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
> having a PCIe capability.  So, I'm stuck and this is where it's ended
> up.  Thanks,

I think implementing such a list is much safer.

Bjorn, why didn't you like that idea?


	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-06-20 16:15         ` Joerg Roedel
  (?)
@ 2013-06-26  4:20         ` Bjorn Helgaas
  2013-06-26 18:45             ` Alex Williamson
  -1 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2013-06-26  4:20 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alex Williamson, open list:INTEL IOMMU (VT-d),
	David Woodhouse, Stephen Hemminger, linux-pci, Don Dutile

On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
>> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
>> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
>> > > + if (!pci_is_root_bus(pdev->bus)) {
>> > > +         struct pci_dev *parent = pdev->bus->self;
>> > > +
>> > > +         if (pci_is_pcie(parent) &&
>> > > +             pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> > > +                 return true;
>> > > + }
>> >
>> > Hmm, that looks a bit dangerous.
>>
>> How so?  The algorithm seems pretty simple and logical.
>
> It is simple, but it is still a heuristic that may fail at some point,
> no?
>
>> Actually, I believe Bjorn rejected the idea of a fixed list because this
>> problem is detectable.  He also doesn't want me messing with quirks to
>> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
>> having a PCIe capability.  So, I'm stuck and this is where it's ended
>> up.  Thanks,
>
> I think implementing such a list is much safer.
>
> Bjorn, why didn't you like that idea?

Sorry, I can't remember, and I haven't been able to find the
discussion where I said that.  I think the current patches are all in
drivers/iommu, and if a list makes sense there, it's fine with me.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-26 18:45             ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-06-26 18:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joerg Roedel, open list:INTEL IOMMU (VT-d),
	David Woodhouse, Stephen Hemminger, linux-pci, Don Dutile

On Tue, 2013-06-25 at 22:20 -0600, Bjorn Helgaas wrote:
> On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
> >> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
> >> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> >> > > + if (!pci_is_root_bus(pdev->bus)) {
> >> > > +         struct pci_dev *parent = pdev->bus->self;
> >> > > +
> >> > > +         if (pci_is_pcie(parent) &&
> >> > > +             pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> >> > > +                 return true;
> >> > > + }
> >> >
> >> > Hmm, that looks a bit dangerous.
> >>
> >> How so?  The algorithm seems pretty simple and logical.
> >
> > It is simple, but it is still a heuristic that may fail at some point,
> > no?
> >
> >> Actually, I believe Bjorn rejected the idea of a fixed list because this
> >> problem is detectable.  He also doesn't want me messing with quirks to
> >> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
> >> having a PCIe capability.  So, I'm stuck and this is where it's ended
> >> up.  Thanks,
> >
> > I think implementing such a list is much safer.
> >
> > Bjorn, why didn't you like that idea?
> 
> Sorry, I can't remember, and I haven't been able to find the
> discussion where I said that.  I think the current patches are all in
> drivers/iommu, and if a list makes sense there, it's fine with me.

Here's the comment I remember

https://bugzilla.kernel.org/show_bug.cgi?id=44881#c7

        Comment #7 From Bjorn Helgaas 2012-08-23 15:58:39
        [snip]
        I doubt the upstream device is at fault.  More likely the
        downstream device is really a PCIe device (a PCIe-to-PCI bridge)
        but just fails to report a PCIe capability.  I think this
        situation is likely too common to deal with via quirks, so we'll
        have to figure out a way to just make this work.

Thanks,
Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
@ 2013-06-26 18:45             ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-06-26 18:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Stephen Hemminger,
	open list:INTEL IOMMU (VT-d),
	David Woodhouse

On Tue, 2013-06-25 at 22:20 -0600, Bjorn Helgaas wrote:
> On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> > On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
> >> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
> >> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
> >> > > + if (!pci_is_root_bus(pdev->bus)) {
> >> > > +         struct pci_dev *parent = pdev->bus->self;
> >> > > +
> >> > > +         if (pci_is_pcie(parent) &&
> >> > > +             pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> >> > > +                 return true;
> >> > > + }
> >> >
> >> > Hmm, that looks a bit dangerous.
> >>
> >> How so?  The algorithm seems pretty simple and logical.
> >
> > It is simple, but it is still a heuristic that may fail at some point,
> > no?
> >
> >> Actually, I believe Bjorn rejected the idea of a fixed list because this
> >> problem is detectable.  He also doesn't want me messing with quirks to
> >> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
> >> having a PCIe capability.  So, I'm stuck and this is where it's ended
> >> up.  Thanks,
> >
> > I think implementing such a list is much safer.
> >
> > Bjorn, why didn't you like that idea?
> 
> Sorry, I can't remember, and I haven't been able to find the
> discussion where I said that.  I think the current patches are all in
> drivers/iommu, and if a list makes sense there, it's fine with me.

Here's the comment I remember

https://bugzilla.kernel.org/show_bug.cgi?id=44881#c7

        Comment #7 From Bjorn Helgaas 2012-08-23 15:58:39
        [snip]
        I doubt the upstream device is at fault.  More likely the
        downstream device is really a PCIe device (a PCIe-to-PCI bridge)
        but just fails to report a PCIe capability.  I think this
        situation is likely too common to deal with via quirks, so we'll
        have to figure out a way to just make this work.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function
  2013-06-26 18:45             ` Alex Williamson
  (?)
@ 2013-06-26 19:11             ` Bjorn Helgaas
  -1 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-06-26 19:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, open list:INTEL IOMMU (VT-d),
	David Woodhouse, Stephen Hemminger, linux-pci, Don Dutile

On Wed, Jun 26, 2013 at 12:45 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-06-25 at 22:20 -0600, Bjorn Helgaas wrote:
>> On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
>> >> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
>> >> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
>> >> > > + if (!pci_is_root_bus(pdev->bus)) {
>> >> > > +         struct pci_dev *parent = pdev->bus->self;
>> >> > > +
>> >> > > +         if (pci_is_pcie(parent) &&
>> >> > > +             pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> >> > > +                 return true;
>> >> > > + }
>> >> >
>> >> > Hmm, that looks a bit dangerous.
>> >>
>> >> How so?  The algorithm seems pretty simple and logical.
>> >
>> > It is simple, but it is still a heuristic that may fail at some point,
>> > no?
>> >
>> >> Actually, I believe Bjorn rejected the idea of a fixed list because this
>> >> problem is detectable.  He also doesn't want me messing with quirks to
>> >> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
>> >> having a PCIe capability.  So, I'm stuck and this is where it's ended
>> >> up.  Thanks,
>> >
>> > I think implementing such a list is much safer.
>> >
>> > Bjorn, why didn't you like that idea?
>>
>> Sorry, I can't remember, and I haven't been able to find the
>> discussion where I said that.  I think the current patches are all in
>> drivers/iommu, and if a list makes sense there, it's fine with me.
>
> Here's the comment I remember
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44881#c7
>
>         Comment #7 From Bjorn Helgaas 2012-08-23 15:58:39
>         [snip]
>         I doubt the upstream device is at fault.  More likely the
>         downstream device is really a PCIe device (a PCIe-to-PCI bridge)
>         but just fails to report a PCIe capability.  I think this
>         situation is likely too common to deal with via quirks, so we'll
>         have to figure out a way to just make this work.

OK, I remember that now.  So the question is whether you want a list
or a set of quirks that may be an ongoing maintenance burden, or
whether you want an algorithm that may be risky but possibly less
maintenance.  I preferred the latter.  I think a failure in the
algorithm will most likely result in a device that just doesn't work
(because we derived a DMA source ID that doesn't match what the IOMMU
sees), so at least the impact is relatively minor, and no worse than a
missing entry in the list of exception devices.

Bjorn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 17:07   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-08 17:07 UTC (permalink / raw)
  To: iommu; +Cc: dwmw2, joro, stephen, linux-pci, ddutile

Joerg,

Where do we stand on this series?  You had a concern that the heuristic
used in patch 1/ could be dangerous.  The suggestion for detecting the
issue was actually from Bjorn who replied with his rationale.  Do you
want to go in the direction of a fixed whitelist or do you agree that
even if the heuristic breaks it provides better behavior than what we
have now?  Thanks,

Alex


On Tue, 2013-05-28 at 12:40 -0600, Alex Williamson wrote:
> This series tries to address:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=44881
> 
> Where pci_find_upstream_pcie_bridge() gets lost trying to find the
> upstream PCIe-to-PCI bridge for a device because the bridge doesn't
> expose a PCIe capability.  To do this, we add a iommu_pci_is_pcie_bridge
> function which includes a quirk to look to the next upstream device
> as a sanity check.  We can then replace pci_find_upstream_pcie_bridge
> with a function that's a bit more generic and less tied to intel-iommu
> eccentricities.
> 
> v2 uses the same logic as v1, but moves the search and match code to
> IOMMU-core since PCI-core doesn't want it.  v1 has several reports
> from users that this solves the problem they have in the above bz.
> Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>       iommu: Quirked PCIe bridge test and search function
>       intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge
> 
> 
>  drivers/iommu/Kconfig               |    5 ++
>  drivers/iommu/Makefile              |    1 
>  drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
>  drivers/iommu/intel_irq_remapping.c |   15 +++++--
>  drivers/iommu/pci.c                 |   69 +++++++++++++++++++++++++++++++
>  drivers/iommu/pci.h                 |   23 ++++++++++
>  6 files changed, 157 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/iommu/pci.c




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 17:07   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-08 17:07 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

Joerg,

Where do we stand on this series?  You had a concern that the heuristic
used in patch 1/ could be dangerous.  The suggestion for detecting the
issue was actually from Bjorn who replied with his rationale.  Do you
want to go in the direction of a fixed whitelist or do you agree that
even if the heuristic breaks it provides better behavior than what we
have now?  Thanks,

Alex


On Tue, 2013-05-28 at 12:40 -0600, Alex Williamson wrote:
> This series tries to address:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=44881
> 
> Where pci_find_upstream_pcie_bridge() gets lost trying to find the
> upstream PCIe-to-PCI bridge for a device because the bridge doesn't
> expose a PCIe capability.  To do this, we add a iommu_pci_is_pcie_bridge
> function which includes a quirk to look to the next upstream device
> as a sanity check.  We can then replace pci_find_upstream_pcie_bridge
> with a function that's a bit more generic and less tied to intel-iommu
> eccentricities.
> 
> v2 uses the same logic as v1, but moves the search and match code to
> IOMMU-core since PCI-core doesn't want it.  v1 has several reports
> from users that this solves the problem they have in the above bz.
> Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>       iommu: Quirked PCIe bridge test and search function
>       intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge
> 
> 
>  drivers/iommu/Kconfig               |    5 ++
>  drivers/iommu/Makefile              |    1 
>  drivers/iommu/intel-iommu.c         |   77 ++++++++++++++++++++++-------------
>  drivers/iommu/intel_irq_remapping.c |   15 +++++--
>  drivers/iommu/pci.c                 |   69 +++++++++++++++++++++++++++++++
>  drivers/iommu/pci.h                 |   23 ++++++++++
>  6 files changed, 157 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/iommu/pci.c

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 19:34     ` Bjorn Helgaas
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-07-08 19:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joro, stephen, linux-pci, ddutile

On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> Joerg,
> 
> Where do we stand on this series?  You had a concern that the heuristic
> used in patch 1/ could be dangerous.  The suggestion for detecting the
> issue was actually from Bjorn who replied with his rationale.  Do you
> want to go in the direction of a fixed whitelist or do you agree that
> even if the heuristic breaks it provides better behavior than what we
> have now?  Thanks,

I'm trying to take a step back and look at the overall design, not
these specific patches.

IOMMUs translate addresses based on their source, i.e., a PCIe
requester ID.  This is made more complicated by the fact that some
bridges "take ownership" (change the requester ID as they forward
transactions upstream), as well as the fact that conventional PCI has
no requester ID at all.  And some broken devices apparently generate
DMA requests using the requester ID of another device.

We currently deal with this using the pci_find_upstream_pcie_bridge()
and pci_get_dma_source() interfaces, but I think there's too much
assembly required by their users.  pci_find_upstream_pcie_bridge()
callers normally loop through all the bridges between the "upstream
PCIe bridge" and the device, checking for bridges that might take
ownership.  They probably also ought to use pci_get_dma_source() to
account for the broken devices, but most callers don't.

Most of this is PCI-specific stuff that should be of interest to all
IOMMU drivers, and the overall structure of calls and looping should
be the same for all of them, so it would be nice to factor it out
somehow.

The attached patch is guaranteed not to even compile; it's just to
make this idea more concrete.  The basic idea is that since the IOMMU
driver wants to perform some action for each possible requester ID the
IOMMU might see, PCI could provide an iterator
("pci_for_each_requester_id()") to do that.

Bjorn


commit afad51492c6672b96c2b0735600d5695e30f7180
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed Jul 3 16:04:26 2013 -0600

    pci-add-for-each-requester-id

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..380eb03 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -17,6 +17,89 @@
 DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
+#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
+#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
+
+static inline bool pci_is_pcix(struct pci_dev *dev)
+{
+	return !!pci_pcix_cap(dev);	/* XXX not implemented */
+}
+
+static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
+{
+	/*
+	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
+	 * Spec v1.0, sec 2.3.
+	 */
+	if (pci_is_pcie(bridge) &&
+	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
+		return true;
+
+	/*
+	 * A PCI-X to PCI-X bridge need not take ownership because there
+	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
+	 * device is added on the secondary bus, the bridge must revert to
+	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
+	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
+	 * tolerate a future hot-add without having to change existing
+	 * IOMMU mappings.
+	 */
+	if (pci_is_pcix(bridge))
+		return true;
+
+	return false;
+}
+
+static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
+					 struct pci_dev *dev)
+{
+	struct pci_dev *bridge;
+	struct pci_bus *child_bus;
+	u8 secondary, subordinate, busn = dev->bus->number;
+
+	if (dev->bus == bus)
+		return dev;
+
+	/*
+	 * There may be several devices on "bus".  Find the one that is a
+	 * bridge leading to "dev".
+	 */
+	list_for_each_entry(bridge, &bus->devices, bus_list) {
+		child_bus = bridge->subordinate;
+		if (child_bus) {
+			secondary = child_bus->busn_res.start;
+			subordinate = child_bus->busn_res.end;
+			if (secondary <= busn && busn <= subordinate)
+				return bridge;
+		}
+	}
+	return NULL;
+}
+
+int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
+			      int (*fn)(struct pci_dev *, void *),
+			      void *data)
+{
+	int ret;
+
+	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
+
+	while (bridge != dev) {
+		if (pci_bridge_may_take_ownership(bridge)) {
+			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
+			if (ret)
+				return ret;
+		}
+		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
+	}
+
+	if (pci_is_pcie(dev) || pci_is_pcix(dev))
+		return fn(dev, PCI_REQUESTER_ID(dev), data);
+
+	/* Conventional PCI has no requester ID */
+	return 0;
+}
+
 /*
  * find the upstream PCIe-to-PCI bridge of a PCI device
  * if the device is PCIE, return NULL
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3a24e4f..cbcc82f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
 }
 struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
 int pci_dev_present(const struct pci_device_id *ids);
+void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
+			       int (*fn)(struct pci_dev *, void *), void *data);
 
 int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
 			     int where, u8 *val);

commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed Jul 3 12:02:15 2013 -0600

    intel-iommu-upstream-pcie

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b91564d..bdb6e6a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 	return 0;
 }
 
+struct domain_context_info {
+	struct dmar_domain *domain;
+	struct pci_dev *dev;
+	int translation;
+	struct intel_iommu *iomumu;
+	bool mapped;
+};
+
+static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	int segment = pci_domain_nr(data->dev);
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+
+	return domain_context_mapping_one(data->domain, segment,
+					  bus, devfn, data->translation);
+}
+
 static int domain_context_mapping(struct dmar_domain *domain,
 				  struct pci_dev *pdev, int translation)
 {
-	int ret;
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
+	struct pci_dev *bridge;
 
-	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
-					 pdev->bus->number, pdev->devfn,
-					 translation);
-	if (ret)
-		return ret;
+	data.domain = domain;
+	data.dev = pdev;
+	data.translation = translation;
+	bridge = xx;
+	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
+}
 
-	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (!tmp)
-		return 0;
-	/* Secondary interface's bus number and devfn 0 */
-	parent = pdev->bus->self;
-	while (parent != tmp) {
-		ret = domain_context_mapping_one(domain,
-						 pci_domain_nr(parent->bus),
-						 parent->bus->number,
-						 parent->devfn, translation);
-		if (ret)
-			return ret;
-		parent = parent->bus->self;
-	}
-	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-		return domain_context_mapping_one(domain,
-					pci_domain_nr(tmp->subordinate),
-					tmp->subordinate->number, 0,
-					translation);
-	else /* this is a legacy PCI bridge */
-		return domain_context_mapping_one(domain,
-						  pci_domain_nr(tmp->bus),
-						  tmp->bus->number,
-						  tmp->devfn,
-						  translation);
+static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+	bool ret;
+
+	ret = device_context_mapped(dev_info->iommu, bus, devfn);
+	if (!ret)
+		dev_info->mapped = false;
+
+	return 0;
 }
 
 static bool domain_context_mapped(struct pci_dev *pdev)
 {
-	bool ret;
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
 	struct intel_iommu *iommu;
+	struct pci_dev *bridge;
 
 	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
 				pdev->devfn);
 	if (!iommu)
 		return false;
 
-	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
-	if (!ret)
-		return false;
-	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (!tmp)
-		return true;
-	/* Secondary interface's bus number and devfn 0 */
-	parent = pdev->bus->self;
-	while (parent != tmp) {
-		ret = device_context_mapped(iommu, parent->bus->number,
-					    parent->devfn);
-		if (!ret)
-			return false;
-		parent = parent->bus->self;
-	}
-	if (pci_is_pcie(tmp))
-		return device_context_mapped(iommu, tmp->subordinate->number,
-					     0);
-	else
-		return device_context_mapped(iommu, tmp->bus->number,
-					     tmp->devfn);
+	data.iommu = iommu;
+	data.mapped = true;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
+	return data.mapped;
 }
 
 /* Returns a number of VTD pages, but aligned to MM page size */
@@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
 	return NULL;
 }
 
+static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	int segment = pci_domain_nr(data->dev);
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+	unsigned long flags;
+	struct device_domain_info *info;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	list_for_each_entry(info, &device_domain_list, global) {
+		if (info->segment == segment &&
+		    info->bus == bus && info->devfn == devfn) {
+			data->domain = info->domain;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+	return 0;
+}
+
 /* domain is initialized */
 static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 {
@@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	struct intel_iommu *iommu;
 	struct dmar_drhd_unit *drhd;
 	struct device_domain_info *info, *tmp;
-	struct pci_dev *dev_tmp;
 	unsigned long flags;
 	int bus = 0, devfn = 0;
 	int segment;
 	int ret;
+	struct domain_context_info data;
 
 	domain = find_domain(pdev);
 	if (domain)
@@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 
 	segment = pci_domain_nr(pdev->bus);
 
-	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (dev_tmp) {
-		if (pci_is_pcie(dev_tmp)) {
-			bus = dev_tmp->subordinate->number;
-			devfn = 0;
-		} else {
-			bus = dev_tmp->bus->number;
-			devfn = dev_tmp->devfn;
-		}
-		spin_lock_irqsave(&device_domain_lock, flags);
-		list_for_each_entry(info, &device_domain_list, global) {
-			if (info->segment == segment &&
-			    info->bus == bus && info->devfn == devfn) {
-				found = info->domain;
-				break;
-			}
-		}
-		spin_unlock_irqrestore(&device_domain_lock, flags);
-		/* pcie-pci bridge already has a domain, uses it */
-		if (found) {
-			domain = found;
-			goto found_domain;
-		}
+	data.domain = NULL;
+	data.dev = pdev;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
+	if (data.domain) {
+		domain = data.domain;
+		goto found_domain;
 	}
 
 	domain = alloc_domain();
@@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
 	return 0;
 }
 
+static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	struct intel_iommu *iommu = data->iommu;
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+
+	iommu_detach_dev(iommu, bus, devfn);
+	return 0;
+}
+
 static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
 					   struct pci_dev *pdev)
 {
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
 
 	if (!iommu || !pdev)
 		return;
 
 	/* dependent device detach */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	/* Secondary interface's bus number and devfn 0 */
-	if (tmp) {
-		parent = pdev->bus->self;
-		while (parent != tmp) {
-			iommu_detach_dev(iommu, parent->bus->number,
-					 parent->devfn);
-			parent = parent->bus->self;
-		}
-		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-			iommu_detach_dev(iommu,
-				tmp->subordinate->number, 0);
-		else /* this is a legacy PCI bridge */
-			iommu_detach_dev(iommu, tmp->bus->number,
-					 tmp->devfn);
-	}
+	data.iommu = iommu;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
 }
 
 static void domain_remove_one_dev_info(struct dmar_domain *domain,

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 19:34     ` Bjorn Helgaas
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-07-08 19:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> Joerg,
> 
> Where do we stand on this series?  You had a concern that the heuristic
> used in patch 1/ could be dangerous.  The suggestion for detecting the
> issue was actually from Bjorn who replied with his rationale.  Do you
> want to go in the direction of a fixed whitelist or do you agree that
> even if the heuristic breaks it provides better behavior than what we
> have now?  Thanks,

I'm trying to take a step back and look at the overall design, not
these specific patches.

IOMMUs translate addresses based on their source, i.e., a PCIe
requester ID.  This is made more complicated by the fact that some
bridges "take ownership" (change the requester ID as they forward
transactions upstream), as well as the fact that conventional PCI has
no requester ID at all.  And some broken devices apparently generate
DMA requests using the requester ID of another device.

We currently deal with this using the pci_find_upstream_pcie_bridge()
and pci_get_dma_source() interfaces, but I think there's too much
assembly required by their users.  pci_find_upstream_pcie_bridge()
callers normally loop through all the bridges between the "upstream
PCIe bridge" and the device, checking for bridges that might take
ownership.  They probably also ought to use pci_get_dma_source() to
account for the broken devices, but most callers don't.

Most of this is PCI-specific stuff that should be of interest to all
IOMMU drivers, and the overall structure of calls and looping should
be the same for all of them, so it would be nice to factor it out
somehow.

The attached patch is guaranteed not to even compile; it's just to
make this idea more concrete.  The basic idea is that since the IOMMU
driver wants to perform some action for each possible requester ID the
IOMMU might see, PCI could provide an iterator
("pci_for_each_requester_id()") to do that.

Bjorn


commit afad51492c6672b96c2b0735600d5695e30f7180
Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Date:   Wed Jul 3 16:04:26 2013 -0600

    pci-add-for-each-requester-id

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..380eb03 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -17,6 +17,89 @@
 DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
+#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
+#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
+
+static inline bool pci_is_pcix(struct pci_dev *dev)
+{
+	return !!pci_pcix_cap(dev);	/* XXX not implemented */
+}
+
+static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
+{
+	/*
+	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
+	 * Spec v1.0, sec 2.3.
+	 */
+	if (pci_is_pcie(bridge) &&
+	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
+		return true;
+
+	/*
+	 * A PCI-X to PCI-X bridge need not take ownership because there
+	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
+	 * device is added on the secondary bus, the bridge must revert to
+	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
+	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
+	 * tolerate a future hot-add without having to change existing
+	 * IOMMU mappings.
+	 */
+	if (pci_is_pcix(bridge))
+		return true;
+
+	return false;
+}
+
+static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
+					 struct pci_dev *dev)
+{
+	struct pci_dev *bridge;
+	struct pci_bus *child_bus;
+	u8 secondary, subordinate, busn = dev->bus->number;
+
+	if (dev->bus == bus)
+		return dev;
+
+	/*
+	 * There may be several devices on "bus".  Find the one that is a
+	 * bridge leading to "dev".
+	 */
+	list_for_each_entry(bridge, &bus->devices, bus_list) {
+		child_bus = bridge->subordinate;
+		if (child_bus) {
+			secondary = child_bus->busn_res.start;
+			subordinate = child_bus->busn_res.end;
+			if (secondary <= busn && busn <= subordinate)
+				return bridge;
+		}
+	}
+	return NULL;
+}
+
+int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
+			      int (*fn)(struct pci_dev *, void *),
+			      void *data)
+{
+	int ret;
+
+	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
+
+	while (bridge != dev) {
+		if (pci_bridge_may_take_ownership(bridge)) {
+			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
+			if (ret)
+				return ret;
+		}
+		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
+	}
+
+	if (pci_is_pcie(dev) || pci_is_pcix(dev))
+		return fn(dev, PCI_REQUESTER_ID(dev), data);
+
+	/* Conventional PCI has no requester ID */
+	return 0;
+}
+
 /*
  * find the upstream PCIe-to-PCI bridge of a PCI device
  * if the device is PCIE, return NULL
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3a24e4f..cbcc82f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
 }
 struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
 int pci_dev_present(const struct pci_device_id *ids);
+void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
+			       int (*fn)(struct pci_dev *, void *), void *data);
 
 int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
 			     int where, u8 *val);

commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Date:   Wed Jul 3 12:02:15 2013 -0600

    intel-iommu-upstream-pcie

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b91564d..bdb6e6a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
 	return 0;
 }
 
+struct domain_context_info {
+	struct dmar_domain *domain;
+	struct pci_dev *dev;
+	int translation;
+	struct intel_iommu *iomumu;
+	bool mapped;
+};
+
+static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	int segment = pci_domain_nr(data->dev);
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+
+	return domain_context_mapping_one(data->domain, segment,
+					  bus, devfn, data->translation);
+}
+
 static int domain_context_mapping(struct dmar_domain *domain,
 				  struct pci_dev *pdev, int translation)
 {
-	int ret;
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
+	struct pci_dev *bridge;
 
-	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
-					 pdev->bus->number, pdev->devfn,
-					 translation);
-	if (ret)
-		return ret;
+	data.domain = domain;
+	data.dev = pdev;
+	data.translation = translation;
+	bridge = xx;
+	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
+}
 
-	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (!tmp)
-		return 0;
-	/* Secondary interface's bus number and devfn 0 */
-	parent = pdev->bus->self;
-	while (parent != tmp) {
-		ret = domain_context_mapping_one(domain,
-						 pci_domain_nr(parent->bus),
-						 parent->bus->number,
-						 parent->devfn, translation);
-		if (ret)
-			return ret;
-		parent = parent->bus->self;
-	}
-	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-		return domain_context_mapping_one(domain,
-					pci_domain_nr(tmp->subordinate),
-					tmp->subordinate->number, 0,
-					translation);
-	else /* this is a legacy PCI bridge */
-		return domain_context_mapping_one(domain,
-						  pci_domain_nr(tmp->bus),
-						  tmp->bus->number,
-						  tmp->devfn,
-						  translation);
+static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+	bool ret;
+
+	ret = device_context_mapped(dev_info->iommu, bus, devfn);
+	if (!ret)
+		dev_info->mapped = false;
+
+	return 0;
 }
 
 static bool domain_context_mapped(struct pci_dev *pdev)
 {
-	bool ret;
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
 	struct intel_iommu *iommu;
+	struct pci_dev *bridge;
 
 	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
 				pdev->devfn);
 	if (!iommu)
 		return false;
 
-	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
-	if (!ret)
-		return false;
-	/* dependent device mapping */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (!tmp)
-		return true;
-	/* Secondary interface's bus number and devfn 0 */
-	parent = pdev->bus->self;
-	while (parent != tmp) {
-		ret = device_context_mapped(iommu, parent->bus->number,
-					    parent->devfn);
-		if (!ret)
-			return false;
-		parent = parent->bus->self;
-	}
-	if (pci_is_pcie(tmp))
-		return device_context_mapped(iommu, tmp->subordinate->number,
-					     0);
-	else
-		return device_context_mapped(iommu, tmp->bus->number,
-					     tmp->devfn);
+	data.iommu = iommu;
+	data.mapped = true;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
+	return data.mapped;
 }
 
 /* Returns a number of VTD pages, but aligned to MM page size */
@@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
 	return NULL;
 }
 
+static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	int segment = pci_domain_nr(data->dev);
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+	unsigned long flags;
+	struct device_domain_info *info;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	list_for_each_entry(info, &device_domain_list, global) {
+		if (info->segment == segment &&
+		    info->bus == bus && info->devfn == devfn) {
+			data->domain = info->domain;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+	return 0;
+}
+
 /* domain is initialized */
 static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 {
@@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 	struct intel_iommu *iommu;
 	struct dmar_drhd_unit *drhd;
 	struct device_domain_info *info, *tmp;
-	struct pci_dev *dev_tmp;
 	unsigned long flags;
 	int bus = 0, devfn = 0;
 	int segment;
 	int ret;
+	struct domain_context_info data;
 
 	domain = find_domain(pdev);
 	if (domain)
@@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 
 	segment = pci_domain_nr(pdev->bus);
 
-	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
-	if (dev_tmp) {
-		if (pci_is_pcie(dev_tmp)) {
-			bus = dev_tmp->subordinate->number;
-			devfn = 0;
-		} else {
-			bus = dev_tmp->bus->number;
-			devfn = dev_tmp->devfn;
-		}
-		spin_lock_irqsave(&device_domain_lock, flags);
-		list_for_each_entry(info, &device_domain_list, global) {
-			if (info->segment == segment &&
-			    info->bus == bus && info->devfn == devfn) {
-				found = info->domain;
-				break;
-			}
-		}
-		spin_unlock_irqrestore(&device_domain_lock, flags);
-		/* pcie-pci bridge already has a domain, uses it */
-		if (found) {
-			domain = found;
-			goto found_domain;
-		}
+	data.domain = NULL;
+	data.dev = pdev;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
+	if (data.domain) {
+		domain = data.domain;
+		goto found_domain;
 	}
 
 	domain = alloc_domain();
@@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
 	return 0;
 }
 
+static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
+{
+	struct domain_context_info *dev_info = data;
+	struct intel_iommu *iommu = data->iommu;
+	u8 bus = data->requester_id >> 8;
+	u8 devfn = data->requester_id & 0xff;
+
+	iommu_detach_dev(iommu, bus, devfn);
+	return 0;
+}
+
 static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
 					   struct pci_dev *pdev)
 {
-	struct pci_dev *tmp, *parent;
+	struct domain_context_info data;
 
 	if (!iommu || !pdev)
 		return;
 
 	/* dependent device detach */
-	tmp = pci_find_upstream_pcie_bridge(pdev);
-	/* Secondary interface's bus number and devfn 0 */
-	if (tmp) {
-		parent = pdev->bus->self;
-		while (parent != tmp) {
-			iommu_detach_dev(iommu, parent->bus->number,
-					 parent->devfn);
-			parent = parent->bus->self;
-		}
-		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-			iommu_detach_dev(iommu,
-				tmp->subordinate->number, 0);
-		else /* this is a legacy PCI bridge */
-			iommu_detach_dev(iommu, tmp->bus->number,
-					 tmp->devfn);
-	}
+	data.iommu = iommu;
+	bridge = xx;
+	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
 }
 
 static void domain_remove_one_dev_info(struct dmar_domain *domain,

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 20:49       ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-08 20:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: iommu, dwmw2, joro, stephen, linux-pci, ddutile

On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > Joerg,
> > 
> > Where do we stand on this series?  You had a concern that the heuristic
> > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > issue was actually from Bjorn who replied with his rationale.  Do you
> > want to go in the direction of a fixed whitelist or do you agree that
> > even if the heuristic breaks it provides better behavior than what we
> > have now?  Thanks,
> 
> I'm trying to take a step back and look at the overall design, not
> these specific patches.
> 
> IOMMUs translate addresses based on their source, i.e., a PCIe
> requester ID.  This is made more complicated by the fact that some
> bridges "take ownership" (change the requester ID as they forward
> transactions upstream), as well as the fact that conventional PCI has
> no requester ID at all.  And some broken devices apparently generate
> DMA requests using the requester ID of another device.
> 
> We currently deal with this using the pci_find_upstream_pcie_bridge()
> and pci_get_dma_source() interfaces, but I think there's too much
> assembly required by their users.  pci_find_upstream_pcie_bridge()
> callers normally loop through all the bridges between the "upstream
> PCIe bridge" and the device, checking for bridges that might take
> ownership.  They probably also ought to use pci_get_dma_source() to
> account for the broken devices, but most callers don't.
> 
> Most of this is PCI-specific stuff that should be of interest to all
> IOMMU drivers, and the overall structure of calls and looping should
> be the same for all of them, so it would be nice to factor it out
> somehow.
> 
> The attached patch is guaranteed not to even compile; it's just to
> make this idea more concrete.  The basic idea is that since the IOMMU
> driver wants to perform some action for each possible requester ID the
> IOMMU might see, PCI could provide an iterator
> ("pci_for_each_requester_id()") to do that.
> 
> Bjorn
> 
> 
> commit afad51492c6672b96c2b0735600d5695e30f7180
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Wed Jul 3 16:04:26 2013 -0600
> 
>     pci-add-for-each-requester-id
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index d0627fa..380eb03 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -17,6 +17,89 @@
>  DECLARE_RWSEM(pci_bus_sem);
>  EXPORT_SYMBOL_GPL(pci_bus_sem);
>  
> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> +
> +static inline bool pci_is_pcix(struct pci_dev *dev)
> +{
> +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> +}
> +
> +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> +{
> +	/*
> +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> +	 * Spec v1.0, sec 2.3.
> +	 */
> +	if (pci_is_pcie(bridge) &&
> +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> +		return true;

Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
capability are exactly what causes the current bug.

> +
> +	/*
> +	 * A PCI-X to PCI-X bridge need not take ownership because there
> +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> +	 * device is added on the secondary bus, the bridge must revert to
> +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> +	 * tolerate a future hot-add without having to change existing
> +	 * IOMMU mappings.
> +	 */
> +	if (pci_is_pcix(bridge))
> +		return true;
> +
> +	return false;
> +}
> +
> +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> +					 struct pci_dev *dev)
> +{
> +	struct pci_dev *bridge;
> +	struct pci_bus *child_bus;
> +	u8 secondary, subordinate, busn = dev->bus->number;
> +
> +	if (dev->bus == bus)
> +		return dev;
> +
> +	/*
> +	 * There may be several devices on "bus".  Find the one that is a
> +	 * bridge leading to "dev".
> +	 */
> +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> +		child_bus = bridge->subordinate;
> +		if (child_bus) {
> +			secondary = child_bus->busn_res.start;
> +			subordinate = child_bus->busn_res.end;
> +			if (secondary <= busn && busn <= subordinate)
> +				return bridge;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> +			      int (*fn)(struct pci_dev *, void *),


u16 requester_id

> +			      void *data)
> +{
> +	int ret;
> +
> +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */

It's unfortunate that our dma quirk still seems to only get called for
the end device where we could have bridges that need it too.

> +
> +	while (bridge != dev) {


I'm having a hard time understanding this function since bridge is never
defined in any of the below sample code.  Would bridge be the root port
above a device?  And what about devices connected to the RC?  Seems like
we'd need to traverse up the bus first, so there's a bit of missing
magic.

> +		if (pci_bridge_may_take_ownership(bridge)) {
> +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> +			if (ret)
> +				return ret;
> +		}
> +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> +	}
> +
> +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> +
> +	/* Conventional PCI has no requester ID */
> +	return 0;
> +}
> +
>  /*
>   * find the upstream PCIe-to-PCI bridge of a PCI device
>   * if the device is PCIE, return NULL
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3a24e4f..cbcc82f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
>  }
>  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
>  int pci_dev_present(const struct pci_device_id *ids);
> +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> +			       int (*fn)(struct pci_dev *, void *), void *data);
>  
>  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
>  			     int where, u8 *val);
> 
> commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Wed Jul 3 12:02:15 2013 -0600
> 
>     intel-iommu-upstream-pcie

I assume this one would is also going to apply dma quirks to dma_ops,
which I'm all for.

> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b91564d..bdb6e6a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
>  	return 0;
>  }
>  
> +struct domain_context_info {
> +	struct dmar_domain *domain;
> +	struct pci_dev *dev;
> +	int translation;
> +	struct intel_iommu *iomumu;

s/iomumu/iommu/

> +	bool mapped;
> +};
> +
> +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	int segment = pci_domain_nr(data->dev);

dev_info->dev

> +	u8 bus = data->requester_id >> 8;

s/data->//

> +	u8 devfn = data->requester_id & 0xff;

s/data->//

> +
> +	return domain_context_mapping_one(data->domain, segment,
> +					  bus, devfn, data->translation);
> +}
> +
>  static int domain_context_mapping(struct dmar_domain *domain,
>  				  struct pci_dev *pdev, int translation)
>  {
> -	int ret;
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
> +	struct pci_dev *bridge;
>  
> -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> -					 pdev->bus->number, pdev->devfn,
> -					 translation);
> -	if (ret)
> -		return ret;
> +	data.domain = domain;
> +	data.dev = pdev;
> +	data.translation = translation;
> +	bridge = xx;

Darn, I was hoping you were going to reveal where bridge comes from.

> +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> +}
>  
> -	/* dependent device mapping */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (!tmp)
> -		return 0;
> -	/* Secondary interface's bus number and devfn 0 */
> -	parent = pdev->bus->self;
> -	while (parent != tmp) {
> -		ret = domain_context_mapping_one(domain,
> -						 pci_domain_nr(parent->bus),
> -						 parent->bus->number,
> -						 parent->devfn, translation);
> -		if (ret)
> -			return ret;
> -		parent = parent->bus->self;
> -	}
> -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> -		return domain_context_mapping_one(domain,
> -					pci_domain_nr(tmp->subordinate),
> -					tmp->subordinate->number, 0,
> -					translation);
> -	else /* this is a legacy PCI bridge */
> -		return domain_context_mapping_one(domain,
> -						  pci_domain_nr(tmp->bus),
> -						  tmp->bus->number,
> -						  tmp->devfn,
> -						  translation);
> +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +	bool ret;
> +
> +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> +	if (!ret)
> +		dev_info->mapped = false;
> +
> +	return 0;
>  }
>  
>  static bool domain_context_mapped(struct pci_dev *pdev)
>  {
> -	bool ret;
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
>  	struct intel_iommu *iommu;
> +	struct pci_dev *bridge;
>  
>  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
>  				pdev->devfn);
>  	if (!iommu)
>  		return false;
>  
> -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> -	if (!ret)
> -		return false;
> -	/* dependent device mapping */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (!tmp)
> -		return true;
> -	/* Secondary interface's bus number and devfn 0 */
> -	parent = pdev->bus->self;
> -	while (parent != tmp) {
> -		ret = device_context_mapped(iommu, parent->bus->number,
> -					    parent->devfn);
> -		if (!ret)
> -			return false;
> -		parent = parent->bus->self;
> -	}
> -	if (pci_is_pcie(tmp))
> -		return device_context_mapped(iommu, tmp->subordinate->number,
> -					     0);
> -	else
> -		return device_context_mapped(iommu, tmp->bus->number,
> -					     tmp->devfn);
> +	data.iommu = iommu;
> +	data.mapped = true;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> +	return data.mapped;
>  }
>  
>  /* Returns a number of VTD pages, but aligned to MM page size */
> @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
>  	return NULL;
>  }
>  
> +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	int segment = pci_domain_nr(data->dev);
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +	unsigned long flags;
> +	struct device_domain_info *info;
> +
> +	spin_lock_irqsave(&device_domain_lock, flags);
> +	list_for_each_entry(info, &device_domain_list, global) {
> +		if (info->segment == segment &&
> +		    info->bus == bus && info->devfn == devfn) {
> +			data->domain = info->domain;
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&device_domain_lock, flags);
> +	return 0;
> +}
> +
>  /* domain is initialized */
>  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  {
> @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  	struct intel_iommu *iommu;
>  	struct dmar_drhd_unit *drhd;
>  	struct device_domain_info *info, *tmp;
> -	struct pci_dev *dev_tmp;
>  	unsigned long flags;
>  	int bus = 0, devfn = 0;
>  	int segment;
>  	int ret;
> +	struct domain_context_info data;
>  
>  	domain = find_domain(pdev);
>  	if (domain)
> @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  
>  	segment = pci_domain_nr(pdev->bus);
>  
> -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (dev_tmp) {
> -		if (pci_is_pcie(dev_tmp)) {
> -			bus = dev_tmp->subordinate->number;
> -			devfn = 0;
> -		} else {
> -			bus = dev_tmp->bus->number;
> -			devfn = dev_tmp->devfn;
> -		}
> -		spin_lock_irqsave(&device_domain_lock, flags);
> -		list_for_each_entry(info, &device_domain_list, global) {
> -			if (info->segment == segment &&
> -			    info->bus == bus && info->devfn == devfn) {
> -				found = info->domain;
> -				break;
> -			}
> -		}
> -		spin_unlock_irqrestore(&device_domain_lock, flags);
> -		/* pcie-pci bridge already has a domain, uses it */
> -		if (found) {
> -			domain = found;
> -			goto found_domain;
> -		}
> +	data.domain = NULL;
> +	data.dev = pdev;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> +	if (data.domain) {
> +		domain = data.domain;
> +		goto found_domain;
>  	}
>  
>  	domain = alloc_domain();
> @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
>  	return 0;
>  }
>  
> +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	struct intel_iommu *iommu = data->iommu;
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +
> +	iommu_detach_dev(iommu, bus, devfn);
> +	return 0;
> +}
> +
>  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
>  					   struct pci_dev *pdev)
>  {
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
>  
>  	if (!iommu || !pdev)
>  		return;
>  
>  	/* dependent device detach */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	/* Secondary interface's bus number and devfn 0 */
> -	if (tmp) {
> -		parent = pdev->bus->self;
> -		while (parent != tmp) {
> -			iommu_detach_dev(iommu, parent->bus->number,
> -					 parent->devfn);
> -			parent = parent->bus->self;
> -		}
> -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> -			iommu_detach_dev(iommu,
> -				tmp->subordinate->number, 0);
> -		else /* this is a legacy PCI bridge */
> -			iommu_detach_dev(iommu, tmp->bus->number,
> -					 tmp->devfn);
> -	}
> +	data.iommu = iommu;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
>  }
>  
>  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-08 20:49       ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-08 20:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > Joerg,
> > 
> > Where do we stand on this series?  You had a concern that the heuristic
> > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > issue was actually from Bjorn who replied with his rationale.  Do you
> > want to go in the direction of a fixed whitelist or do you agree that
> > even if the heuristic breaks it provides better behavior than what we
> > have now?  Thanks,
> 
> I'm trying to take a step back and look at the overall design, not
> these specific patches.
> 
> IOMMUs translate addresses based on their source, i.e., a PCIe
> requester ID.  This is made more complicated by the fact that some
> bridges "take ownership" (change the requester ID as they forward
> transactions upstream), as well as the fact that conventional PCI has
> no requester ID at all.  And some broken devices apparently generate
> DMA requests using the requester ID of another device.
> 
> We currently deal with this using the pci_find_upstream_pcie_bridge()
> and pci_get_dma_source() interfaces, but I think there's too much
> assembly required by their users.  pci_find_upstream_pcie_bridge()
> callers normally loop through all the bridges between the "upstream
> PCIe bridge" and the device, checking for bridges that might take
> ownership.  They probably also ought to use pci_get_dma_source() to
> account for the broken devices, but most callers don't.
> 
> Most of this is PCI-specific stuff that should be of interest to all
> IOMMU drivers, and the overall structure of calls and looping should
> be the same for all of them, so it would be nice to factor it out
> somehow.
> 
> The attached patch is guaranteed not to even compile; it's just to
> make this idea more concrete.  The basic idea is that since the IOMMU
> driver wants to perform some action for each possible requester ID the
> IOMMU might see, PCI could provide an iterator
> ("pci_for_each_requester_id()") to do that.
> 
> Bjorn
> 
> 
> commit afad51492c6672b96c2b0735600d5695e30f7180
> Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Date:   Wed Jul 3 16:04:26 2013 -0600
> 
>     pci-add-for-each-requester-id
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index d0627fa..380eb03 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -17,6 +17,89 @@
>  DECLARE_RWSEM(pci_bus_sem);
>  EXPORT_SYMBOL_GPL(pci_bus_sem);
>  
> +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> +
> +static inline bool pci_is_pcix(struct pci_dev *dev)
> +{
> +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> +}
> +
> +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> +{
> +	/*
> +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> +	 * Spec v1.0, sec 2.3.
> +	 */
> +	if (pci_is_pcie(bridge) &&
> +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> +		return true;

Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
capability are exactly what causes the current bug.

> +
> +	/*
> +	 * A PCI-X to PCI-X bridge need not take ownership because there
> +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> +	 * device is added on the secondary bus, the bridge must revert to
> +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> +	 * tolerate a future hot-add without having to change existing
> +	 * IOMMU mappings.
> +	 */
> +	if (pci_is_pcix(bridge))
> +		return true;
> +
> +	return false;
> +}
> +
> +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> +					 struct pci_dev *dev)
> +{
> +	struct pci_dev *bridge;
> +	struct pci_bus *child_bus;
> +	u8 secondary, subordinate, busn = dev->bus->number;
> +
> +	if (dev->bus == bus)
> +		return dev;
> +
> +	/*
> +	 * There may be several devices on "bus".  Find the one that is a
> +	 * bridge leading to "dev".
> +	 */
> +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> +		child_bus = bridge->subordinate;
> +		if (child_bus) {
> +			secondary = child_bus->busn_res.start;
> +			subordinate = child_bus->busn_res.end;
> +			if (secondary <= busn && busn <= subordinate)
> +				return bridge;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> +			      int (*fn)(struct pci_dev *, void *),


u16 requester_id

> +			      void *data)
> +{
> +	int ret;
> +
> +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */

It's unfortunate that our dma quirk still seems to only get called for
the end device where we could have bridges that need it too.

> +
> +	while (bridge != dev) {


I'm having a hard time understanding this function since bridge is never
defined in any of the below sample code.  Would bridge be the root port
above a device?  And what about devices connected to the RC?  Seems like
we'd need to traverse up the bus first, so there's a bit of missing
magic.

> +		if (pci_bridge_may_take_ownership(bridge)) {
> +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> +			if (ret)
> +				return ret;
> +		}
> +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> +	}
> +
> +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> +
> +	/* Conventional PCI has no requester ID */
> +	return 0;
> +}
> +
>  /*
>   * find the upstream PCIe-to-PCI bridge of a PCI device
>   * if the device is PCIE, return NULL
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3a24e4f..cbcc82f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
>  }
>  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
>  int pci_dev_present(const struct pci_device_id *ids);
> +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> +			       int (*fn)(struct pci_dev *, void *), void *data);
>  
>  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
>  			     int where, u8 *val);
> 
> commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Date:   Wed Jul 3 12:02:15 2013 -0600
> 
>     intel-iommu-upstream-pcie

I assume this one would is also going to apply dma quirks to dma_ops,
which I'm all for.

> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b91564d..bdb6e6a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
>  	return 0;
>  }
>  
> +struct domain_context_info {
> +	struct dmar_domain *domain;
> +	struct pci_dev *dev;
> +	int translation;
> +	struct intel_iommu *iomumu;

s/iomumu/iommu/

> +	bool mapped;
> +};
> +
> +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	int segment = pci_domain_nr(data->dev);

dev_info->dev

> +	u8 bus = data->requester_id >> 8;

s/data->//

> +	u8 devfn = data->requester_id & 0xff;

s/data->//

> +
> +	return domain_context_mapping_one(data->domain, segment,
> +					  bus, devfn, data->translation);
> +}
> +
>  static int domain_context_mapping(struct dmar_domain *domain,
>  				  struct pci_dev *pdev, int translation)
>  {
> -	int ret;
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
> +	struct pci_dev *bridge;
>  
> -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> -					 pdev->bus->number, pdev->devfn,
> -					 translation);
> -	if (ret)
> -		return ret;
> +	data.domain = domain;
> +	data.dev = pdev;
> +	data.translation = translation;
> +	bridge = xx;

Darn, I was hoping you were going to reveal where bridge comes from.

> +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> +}
>  
> -	/* dependent device mapping */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (!tmp)
> -		return 0;
> -	/* Secondary interface's bus number and devfn 0 */
> -	parent = pdev->bus->self;
> -	while (parent != tmp) {
> -		ret = domain_context_mapping_one(domain,
> -						 pci_domain_nr(parent->bus),
> -						 parent->bus->number,
> -						 parent->devfn, translation);
> -		if (ret)
> -			return ret;
> -		parent = parent->bus->self;
> -	}
> -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> -		return domain_context_mapping_one(domain,
> -					pci_domain_nr(tmp->subordinate),
> -					tmp->subordinate->number, 0,
> -					translation);
> -	else /* this is a legacy PCI bridge */
> -		return domain_context_mapping_one(domain,
> -						  pci_domain_nr(tmp->bus),
> -						  tmp->bus->number,
> -						  tmp->devfn,
> -						  translation);
> +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +	bool ret;
> +
> +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> +	if (!ret)
> +		dev_info->mapped = false;
> +
> +	return 0;
>  }
>  
>  static bool domain_context_mapped(struct pci_dev *pdev)
>  {
> -	bool ret;
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
>  	struct intel_iommu *iommu;
> +	struct pci_dev *bridge;
>  
>  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
>  				pdev->devfn);
>  	if (!iommu)
>  		return false;
>  
> -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> -	if (!ret)
> -		return false;
> -	/* dependent device mapping */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (!tmp)
> -		return true;
> -	/* Secondary interface's bus number and devfn 0 */
> -	parent = pdev->bus->self;
> -	while (parent != tmp) {
> -		ret = device_context_mapped(iommu, parent->bus->number,
> -					    parent->devfn);
> -		if (!ret)
> -			return false;
> -		parent = parent->bus->self;
> -	}
> -	if (pci_is_pcie(tmp))
> -		return device_context_mapped(iommu, tmp->subordinate->number,
> -					     0);
> -	else
> -		return device_context_mapped(iommu, tmp->bus->number,
> -					     tmp->devfn);
> +	data.iommu = iommu;
> +	data.mapped = true;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> +	return data.mapped;
>  }
>  
>  /* Returns a number of VTD pages, but aligned to MM page size */
> @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
>  	return NULL;
>  }
>  
> +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	int segment = pci_domain_nr(data->dev);
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +	unsigned long flags;
> +	struct device_domain_info *info;
> +
> +	spin_lock_irqsave(&device_domain_lock, flags);
> +	list_for_each_entry(info, &device_domain_list, global) {
> +		if (info->segment == segment &&
> +		    info->bus == bus && info->devfn == devfn) {
> +			data->domain = info->domain;
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&device_domain_lock, flags);
> +	return 0;
> +}
> +
>  /* domain is initialized */
>  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  {
> @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  	struct intel_iommu *iommu;
>  	struct dmar_drhd_unit *drhd;
>  	struct device_domain_info *info, *tmp;
> -	struct pci_dev *dev_tmp;
>  	unsigned long flags;
>  	int bus = 0, devfn = 0;
>  	int segment;
>  	int ret;
> +	struct domain_context_info data;
>  
>  	domain = find_domain(pdev);
>  	if (domain)
> @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>  
>  	segment = pci_domain_nr(pdev->bus);
>  
> -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> -	if (dev_tmp) {
> -		if (pci_is_pcie(dev_tmp)) {
> -			bus = dev_tmp->subordinate->number;
> -			devfn = 0;
> -		} else {
> -			bus = dev_tmp->bus->number;
> -			devfn = dev_tmp->devfn;
> -		}
> -		spin_lock_irqsave(&device_domain_lock, flags);
> -		list_for_each_entry(info, &device_domain_list, global) {
> -			if (info->segment == segment &&
> -			    info->bus == bus && info->devfn == devfn) {
> -				found = info->domain;
> -				break;
> -			}
> -		}
> -		spin_unlock_irqrestore(&device_domain_lock, flags);
> -		/* pcie-pci bridge already has a domain, uses it */
> -		if (found) {
> -			domain = found;
> -			goto found_domain;
> -		}
> +	data.domain = NULL;
> +	data.dev = pdev;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> +	if (data.domain) {
> +		domain = data.domain;
> +		goto found_domain;
>  	}
>  
>  	domain = alloc_domain();
> @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
>  	return 0;
>  }
>  
> +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> +{
> +	struct domain_context_info *dev_info = data;
> +	struct intel_iommu *iommu = data->iommu;
> +	u8 bus = data->requester_id >> 8;
> +	u8 devfn = data->requester_id & 0xff;
> +
> +	iommu_detach_dev(iommu, bus, devfn);
> +	return 0;
> +}
> +
>  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
>  					   struct pci_dev *pdev)
>  {
> -	struct pci_dev *tmp, *parent;
> +	struct domain_context_info data;
>  
>  	if (!iommu || !pdev)
>  		return;
>  
>  	/* dependent device detach */
> -	tmp = pci_find_upstream_pcie_bridge(pdev);
> -	/* Secondary interface's bus number and devfn 0 */
> -	if (tmp) {
> -		parent = pdev->bus->self;
> -		while (parent != tmp) {
> -			iommu_detach_dev(iommu, parent->bus->number,
> -					 parent->devfn);
> -			parent = parent->bus->self;
> -		}
> -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> -			iommu_detach_dev(iommu,
> -				tmp->subordinate->number, 0);
> -		else /* this is a legacy PCI bridge */
> -			iommu_detach_dev(iommu, tmp->bus->number,
> -					 tmp->devfn);
> -	}
> +	data.iommu = iommu;
> +	bridge = xx;
> +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
>  }
>  
>  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
  2013-07-08 20:49       ` Alex Williamson
  (?)
@ 2013-07-08 21:51       ` Bjorn Helgaas
  2013-07-09 18:27           ` Alex Williamson
  -1 siblings, 1 reply; 34+ messages in thread
From: Bjorn Helgaas @ 2013-07-08 21:51 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joro, stephen, linux-pci, ddutile

On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
> On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > > Joerg,
> > > 
> > > Where do we stand on this series?  You had a concern that the heuristic
> > > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > > issue was actually from Bjorn who replied with his rationale.  Do you
> > > want to go in the direction of a fixed whitelist or do you agree that
> > > even if the heuristic breaks it provides better behavior than what we
> > > have now?  Thanks,
> > 
> > I'm trying to take a step back and look at the overall design, not
> > these specific patches.
> > 
> > IOMMUs translate addresses based on their source, i.e., a PCIe
> > requester ID.  This is made more complicated by the fact that some
> > bridges "take ownership" (change the requester ID as they forward
> > transactions upstream), as well as the fact that conventional PCI has
> > no requester ID at all.  And some broken devices apparently generate
> > DMA requests using the requester ID of another device.
> > 
> > We currently deal with this using the pci_find_upstream_pcie_bridge()
> > and pci_get_dma_source() interfaces, but I think there's too much
> > assembly required by their users.  pci_find_upstream_pcie_bridge()
> > callers normally loop through all the bridges between the "upstream
> > PCIe bridge" and the device, checking for bridges that might take
> > ownership.  They probably also ought to use pci_get_dma_source() to
> > account for the broken devices, but most callers don't.
> > 
> > Most of this is PCI-specific stuff that should be of interest to all
> > IOMMU drivers, and the overall structure of calls and looping should
> > be the same for all of them, so it would be nice to factor it out
> > somehow.
> > 
> > The attached patch is guaranteed not to even compile; it's just to
> > make this idea more concrete.  The basic idea is that since the IOMMU
> > driver wants to perform some action for each possible requester ID the
> > IOMMU might see, PCI could provide an iterator
> > ("pci_for_each_requester_id()") to do that.
> > 
> > Bjorn
> > 
> > 
> > commit afad51492c6672b96c2b0735600d5695e30f7180
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Wed Jul 3 16:04:26 2013 -0600
> > 
> >     pci-add-for-each-requester-id
> > 
> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > index d0627fa..380eb03 100644
> > --- a/drivers/pci/search.c
> > +++ b/drivers/pci/search.c
> > @@ -17,6 +17,89 @@
> >  DECLARE_RWSEM(pci_bus_sem);
> >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> >  
> > +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> > +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> > +
> > +static inline bool pci_is_pcix(struct pci_dev *dev)
> > +{
> > +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> > +}
> > +
> > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> > +{
> > +	/*
> > +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> > +	 * Spec v1.0, sec 2.3.
> > +	 */
> > +	if (pci_is_pcie(bridge) &&
> > +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> > +		return true;
> 
> Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
> capability are exactly what causes the current bug.

Likely so.  We might be able to identify that as "secondary bus is not
PCIe" or something.

> > +
> > +	/*
> > +	 * A PCI-X to PCI-X bridge need not take ownership because there
> > +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> > +	 * device is added on the secondary bus, the bridge must revert to
> > +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> > +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> > +	 * tolerate a future hot-add without having to change existing
> > +	 * IOMMU mappings.
> > +	 */
> > +	if (pci_is_pcix(bridge))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> > +					 struct pci_dev *dev)
> > +{
> > +	struct pci_dev *bridge;
> > +	struct pci_bus *child_bus;
> > +	u8 secondary, subordinate, busn = dev->bus->number;
> > +
> > +	if (dev->bus == bus)
> > +		return dev;
> > +
> > +	/*
> > +	 * There may be several devices on "bus".  Find the one that is a
> > +	 * bridge leading to "dev".
> > +	 */
> > +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> > +		child_bus = bridge->subordinate;
> > +		if (child_bus) {
> > +			secondary = child_bus->busn_res.start;
> > +			subordinate = child_bus->busn_res.end;
> > +			if (secondary <= busn && busn <= subordinate)
> > +				return bridge;
> > +		}
> > +	}
> > +	return NULL;
> > +}
> > +
> > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > +			      int (*fn)(struct pci_dev *, void *),
> 
> u16 requester_id
> 
> > +			      void *data)
> > +{
> > +	int ret;
> > +
> > +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
> 
> It's unfortunate that our dma quirk still seems to only get called for
> the end device where we could have bridges that need it too.

Sure, we can do that if necessary.

I'm not trying to present this as any kind of finished solution.  The
point is to figure out if we can make an interface that's a little
more abstract so we can pull the PCI topology issues out of the IOMMU
drivers, and I just thought something that looked like code would be
more concrete and easier to discuss.

> > +
> > +	while (bridge != dev) {
> 
> 
> I'm having a hard time understanding this function since bridge is never
> defined in any of the below sample code.  Would bridge be the root port
> above a device?  And what about devices connected to the RC?  Seems like
> we'd need to traverse up the bus first, so there's a bit of missing
> magic.

I was assuming that an IOMMU could be either at the root of a PCI
hierarchy or buried farther down.  If that's true, the IOMMU driver
might need to supply the top of the PCI hierarchy for which it
translates.  That's what I'm trying to get at with the "bridge"
argument.

If the IOMMU is buried, I assumed there was no point in iterating over
bridges upstream of the IOMMU because it wouldn't see requester IDs of
those upstream bridges.  PCI wouldn't have any way to figure out how
far up to go, hence the argument.

> > +		if (pci_bridge_may_take_ownership(bridge)) {
> > +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> > +			if (ret)
> > +				return ret;
> > +		}
> > +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> > +	}
> > +
> > +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> > +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> > +
> > +	/* Conventional PCI has no requester ID */
> > +	return 0;
> > +}
> > +
> >  /*
> >   * find the upstream PCIe-to-PCI bridge of a PCI device
> >   * if the device is PCIE, return NULL
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 3a24e4f..cbcc82f 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
> >  }
> >  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
> >  int pci_dev_present(const struct pci_device_id *ids);
> > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > +			       int (*fn)(struct pci_dev *, void *), void *data);
> >  
> >  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
> >  			     int where, u8 *val);
> > 
> > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Wed Jul 3 12:02:15 2013 -0600
> > 
> >     intel-iommu-upstream-pcie
> 
> I assume this one would is also going to apply dma quirks to dma_ops,
> which I'm all for.
> 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index b91564d..bdb6e6a 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
> >  	return 0;
> >  }
> >  
> > +struct domain_context_info {
> > +	struct dmar_domain *domain;
> > +	struct pci_dev *dev;
> > +	int translation;
> > +	struct intel_iommu *iomumu;
> 
> s/iomumu/iommu/
> 
> > +	bool mapped;
> > +};
> > +
> > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> > +{
> > +	struct domain_context_info *dev_info = data;
> > +	int segment = pci_domain_nr(data->dev);
> 
> dev_info->dev
> 
> > +	u8 bus = data->requester_id >> 8;
> 
> s/data->//
> 
> > +	u8 devfn = data->requester_id & 0xff;
> 
> s/data->//

I told you it wouldn't even compile :)  I'm only trying to present a
possibility, not all the nitty-gritty details.

> > +
> > +	return domain_context_mapping_one(data->domain, segment,
> > +					  bus, devfn, data->translation);
> > +}
> > +
> >  static int domain_context_mapping(struct dmar_domain *domain,
> >  				  struct pci_dev *pdev, int translation)
> >  {
> > -	int ret;
> > -	struct pci_dev *tmp, *parent;
> > +	struct domain_context_info data;
> > +	struct pci_dev *bridge;
> >  
> > -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> > -					 pdev->bus->number, pdev->devfn,
> > -					 translation);
> > -	if (ret)
> > -		return ret;
> > +	data.domain = domain;
> > +	data.dev = pdev;
> > +	data.translation = translation;
> > +	bridge = xx;
> 
> Darn, I was hoping you were going to reveal where bridge comes from.
> 
> > +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> > +}
> >  
> > -	/* dependent device mapping */
> > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > -	if (!tmp)
> > -		return 0;
> > -	/* Secondary interface's bus number and devfn 0 */
> > -	parent = pdev->bus->self;
> > -	while (parent != tmp) {
> > -		ret = domain_context_mapping_one(domain,
> > -						 pci_domain_nr(parent->bus),
> > -						 parent->bus->number,
> > -						 parent->devfn, translation);
> > -		if (ret)
> > -			return ret;
> > -		parent = parent->bus->self;
> > -	}
> > -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > -		return domain_context_mapping_one(domain,
> > -					pci_domain_nr(tmp->subordinate),
> > -					tmp->subordinate->number, 0,
> > -					translation);
> > -	else /* this is a legacy PCI bridge */
> > -		return domain_context_mapping_one(domain,
> > -						  pci_domain_nr(tmp->bus),
> > -						  tmp->bus->number,
> > -						  tmp->devfn,
> > -						  translation);
> > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> > +{
> > +	struct domain_context_info *dev_info = data;
> > +	u8 bus = data->requester_id >> 8;
> > +	u8 devfn = data->requester_id & 0xff;
> > +	bool ret;
> > +
> > +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> > +	if (!ret)
> > +		dev_info->mapped = false;
> > +
> > +	return 0;
> >  }
> >  
> >  static bool domain_context_mapped(struct pci_dev *pdev)
> >  {
> > -	bool ret;
> > -	struct pci_dev *tmp, *parent;
> > +	struct domain_context_info data;
> >  	struct intel_iommu *iommu;
> > +	struct pci_dev *bridge;
> >  
> >  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> >  				pdev->devfn);
> >  	if (!iommu)
> >  		return false;
> >  
> > -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> > -	if (!ret)
> > -		return false;
> > -	/* dependent device mapping */
> > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > -	if (!tmp)
> > -		return true;
> > -	/* Secondary interface's bus number and devfn 0 */
> > -	parent = pdev->bus->self;
> > -	while (parent != tmp) {
> > -		ret = device_context_mapped(iommu, parent->bus->number,
> > -					    parent->devfn);
> > -		if (!ret)
> > -			return false;
> > -		parent = parent->bus->self;
> > -	}
> > -	if (pci_is_pcie(tmp))
> > -		return device_context_mapped(iommu, tmp->subordinate->number,
> > -					     0);
> > -	else
> > -		return device_context_mapped(iommu, tmp->bus->number,
> > -					     tmp->devfn);
> > +	data.iommu = iommu;
> > +	data.mapped = true;
> > +	bridge = xx;
> > +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> > +	return data.mapped;
> >  }
> >  
> >  /* Returns a number of VTD pages, but aligned to MM page size */
> > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
> >  	return NULL;
> >  }
> >  
> > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> > +{
> > +	struct domain_context_info *dev_info = data;
> > +	int segment = pci_domain_nr(data->dev);
> > +	u8 bus = data->requester_id >> 8;
> > +	u8 devfn = data->requester_id & 0xff;
> > +	unsigned long flags;
> > +	struct device_domain_info *info;
> > +
> > +	spin_lock_irqsave(&device_domain_lock, flags);
> > +	list_for_each_entry(info, &device_domain_list, global) {
> > +		if (info->segment == segment &&
> > +		    info->bus == bus && info->devfn == devfn) {
> > +			data->domain = info->domain;
> > +			break;
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&device_domain_lock, flags);
> > +	return 0;
> > +}
> > +
> >  /* domain is initialized */
> >  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> >  {
> > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> >  	struct intel_iommu *iommu;
> >  	struct dmar_drhd_unit *drhd;
> >  	struct device_domain_info *info, *tmp;
> > -	struct pci_dev *dev_tmp;
> >  	unsigned long flags;
> >  	int bus = 0, devfn = 0;
> >  	int segment;
> >  	int ret;
> > +	struct domain_context_info data;
> >  
> >  	domain = find_domain(pdev);
> >  	if (domain)
> > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> >  
> >  	segment = pci_domain_nr(pdev->bus);
> >  
> > -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> > -	if (dev_tmp) {
> > -		if (pci_is_pcie(dev_tmp)) {
> > -			bus = dev_tmp->subordinate->number;
> > -			devfn = 0;
> > -		} else {
> > -			bus = dev_tmp->bus->number;
> > -			devfn = dev_tmp->devfn;
> > -		}
> > -		spin_lock_irqsave(&device_domain_lock, flags);
> > -		list_for_each_entry(info, &device_domain_list, global) {
> > -			if (info->segment == segment &&
> > -			    info->bus == bus && info->devfn == devfn) {
> > -				found = info->domain;
> > -				break;
> > -			}
> > -		}
> > -		spin_unlock_irqrestore(&device_domain_lock, flags);
> > -		/* pcie-pci bridge already has a domain, uses it */
> > -		if (found) {
> > -			domain = found;
> > -			goto found_domain;
> > -		}
> > +	data.domain = NULL;
> > +	data.dev = pdev;
> > +	bridge = xx;
> > +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> > +	if (data.domain) {
> > +		domain = data.domain;
> > +		goto found_domain;
> >  	}
> >  
> >  	domain = alloc_domain();
> > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
> >  	return 0;
> >  }
> >  
> > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> > +{
> > +	struct domain_context_info *dev_info = data;
> > +	struct intel_iommu *iommu = data->iommu;
> > +	u8 bus = data->requester_id >> 8;
> > +	u8 devfn = data->requester_id & 0xff;
> > +
> > +	iommu_detach_dev(iommu, bus, devfn);
> > +	return 0;
> > +}
> > +
> >  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
> >  					   struct pci_dev *pdev)
> >  {
> > -	struct pci_dev *tmp, *parent;
> > +	struct domain_context_info data;
> >  
> >  	if (!iommu || !pdev)
> >  		return;
> >  
> >  	/* dependent device detach */
> > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > -	/* Secondary interface's bus number and devfn 0 */
> > -	if (tmp) {
> > -		parent = pdev->bus->self;
> > -		while (parent != tmp) {
> > -			iommu_detach_dev(iommu, parent->bus->number,
> > -					 parent->devfn);
> > -			parent = parent->bus->self;
> > -		}
> > -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > -			iommu_detach_dev(iommu,
> > -				tmp->subordinate->number, 0);
> > -		else /* this is a legacy PCI bridge */
> > -			iommu_detach_dev(iommu, tmp->bus->number,
> > -					 tmp->devfn);
> > -	}
> > +	data.iommu = iommu;
> > +	bridge = xx;
> > +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
> >  }
> >  
> >  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-09 18:27           ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-09 18:27 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: iommu, dwmw2, joro, stephen, linux-pci, ddutile

On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > > > Joerg,
> > > > 
> > > > Where do we stand on this series?  You had a concern that the heuristic
> > > > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > > > issue was actually from Bjorn who replied with his rationale.  Do you
> > > > want to go in the direction of a fixed whitelist or do you agree that
> > > > even if the heuristic breaks it provides better behavior than what we
> > > > have now?  Thanks,
> > > 
> > > I'm trying to take a step back and look at the overall design, not
> > > these specific patches.
> > > 
> > > IOMMUs translate addresses based on their source, i.e., a PCIe
> > > requester ID.  This is made more complicated by the fact that some
> > > bridges "take ownership" (change the requester ID as they forward
> > > transactions upstream), as well as the fact that conventional PCI has
> > > no requester ID at all.  And some broken devices apparently generate
> > > DMA requests using the requester ID of another device.
> > > 
> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
> > > and pci_get_dma_source() interfaces, but I think there's too much
> > > assembly required by their users.  pci_find_upstream_pcie_bridge()
> > > callers normally loop through all the bridges between the "upstream
> > > PCIe bridge" and the device, checking for bridges that might take
> > > ownership.  They probably also ought to use pci_get_dma_source() to
> > > account for the broken devices, but most callers don't.
> > > 
> > > Most of this is PCI-specific stuff that should be of interest to all
> > > IOMMU drivers, and the overall structure of calls and looping should
> > > be the same for all of them, so it would be nice to factor it out
> > > somehow.
> > > 
> > > The attached patch is guaranteed not to even compile; it's just to
> > > make this idea more concrete.  The basic idea is that since the IOMMU
> > > driver wants to perform some action for each possible requester ID the
> > > IOMMU might see, PCI could provide an iterator
> > > ("pci_for_each_requester_id()") to do that.
> > > 
> > > Bjorn
> > > 
> > > 
> > > commit afad51492c6672b96c2b0735600d5695e30f7180
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date:   Wed Jul 3 16:04:26 2013 -0600
> > > 
> > >     pci-add-for-each-requester-id
> > > 
> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > > index d0627fa..380eb03 100644
> > > --- a/drivers/pci/search.c
> > > +++ b/drivers/pci/search.c
> > > @@ -17,6 +17,89 @@
> > >  DECLARE_RWSEM(pci_bus_sem);
> > >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> > >  
> > > +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> > > +
> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
> > > +{
> > > +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> > > +}
> > > +
> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> > > +{
> > > +	/*
> > > +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> > > +	 * Spec v1.0, sec 2.3.
> > > +	 */
> > > +	if (pci_is_pcie(bridge) &&
> > > +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> > > +		return true;
> > 
> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
> > capability are exactly what causes the current bug.
> 
> Likely so.  We might be able to identify that as "secondary bus is not
> PCIe" or something.
> 
> > > +
> > > +	/*
> > > +	 * A PCI-X to PCI-X bridge need not take ownership because there
> > > +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> > > +	 * device is added on the secondary bus, the bridge must revert to
> > > +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> > > +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> > > +	 * tolerate a future hot-add without having to change existing
> > > +	 * IOMMU mappings.
> > > +	 */
> > > +	if (pci_is_pcix(bridge))
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> > > +					 struct pci_dev *dev)
> > > +{
> > > +	struct pci_dev *bridge;
> > > +	struct pci_bus *child_bus;
> > > +	u8 secondary, subordinate, busn = dev->bus->number;
> > > +
> > > +	if (dev->bus == bus)
> > > +		return dev;
> > > +
> > > +	/*
> > > +	 * There may be several devices on "bus".  Find the one that is a
> > > +	 * bridge leading to "dev".
> > > +	 */
> > > +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> > > +		child_bus = bridge->subordinate;
> > > +		if (child_bus) {
> > > +			secondary = child_bus->busn_res.start;
> > > +			subordinate = child_bus->busn_res.end;
> > > +			if (secondary <= busn && busn <= subordinate)
> > > +				return bridge;
> > > +		}
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			      int (*fn)(struct pci_dev *, void *),
> > 
> > u16 requester_id
> > 
> > > +			      void *data)
> > > +{
> > > +	int ret;
> > > +
> > > +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
> > 
> > It's unfortunate that our dma quirk still seems to only get called for
> > the end device where we could have bridges that need it too.
> 
> Sure, we can do that if necessary.
> 
> I'm not trying to present this as any kind of finished solution.  The
> point is to figure out if we can make an interface that's a little
> more abstract so we can pull the PCI topology issues out of the IOMMU
> drivers, and I just thought something that looked like code would be
> more concrete and easier to discuss.
> 
> > > +
> > > +	while (bridge != dev) {
> > 
> > 
> > I'm having a hard time understanding this function since bridge is never
> > defined in any of the below sample code.  Would bridge be the root port
> > above a device?  And what about devices connected to the RC?  Seems like
> > we'd need to traverse up the bus first, so there's a bit of missing
> > magic.
> 
> I was assuming that an IOMMU could be either at the root of a PCI
> hierarchy or buried farther down.  If that's true, the IOMMU driver
> might need to supply the top of the PCI hierarchy for which it
> translates.  That's what I'm trying to get at with the "bridge"
> argument.
> 
> If the IOMMU is buried, I assumed there was no point in iterating over
> bridges upstream of the IOMMU because it wouldn't see requester IDs of
> those upstream bridges.  PCI wouldn't have any way to figure out how
> far up to go, hence the argument.

I don't think that getting the bridge is going to be quite so simple.
VT-d can have multiple IOMMUs, each handling a specific device or
hierarchy.  One IOMMU per domain can be default IOMMU for any device not
otherwise specified.  The IOMMU is not a PCI enumerable device.  AMD-Vi
does have a PCI visible IOMMU, but it's an endpoint on the root complex,
not the top of a hierarchy.

I'm warming up to your approach, but I think I'd rather see something
like:

struct pci_dev *pci_get_requester_id(struct pci_dev *pdev)
{
	do {
		/* Apply DMA quirks at every step */
		pdev = pci_get_dma_source(pdev);

		/* Done for PCIe devices or broken bridges, PCI-X TBD */
		if (pci_is_pcie(pdev) || /* broken pcie bridge test */)
			break;

	} while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self));

	return pdev;
}


/* Start with requester ID pdev and pdev->bus, this way we handle both peer
 * and parent requester IDs */
int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus,
				     int (*fn)(struct pci_dev *, void *), void *data)
{
	struct pci_dev *tmp;
	int ret;

	list_for_each_entry(tmp, &bus->devices, bus_list) {
		if (pci_get_requester_id(tmp) == pdev) {
			ret = fn(tmp, data);
			if (ret)
				...

			if (tmp->subordinate) {
				ret = pci_for_each_requester_id(pdev, tmp->subordinate, fn);
				if (ret)
					...
			}
		}
	}
	
	return ret;
}

Seems like that would be sufficient to leverage the intel-iommu changes
you've drafted.  Thanks,

Alex

> > > +		if (pci_bridge_may_take_ownership(bridge)) {
> > > +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > > +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> > > +	}
> > > +
> > > +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> > > +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> > > +
> > > +	/* Conventional PCI has no requester ID */
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * find the upstream PCIe-to-PCI bridge of a PCI device
> > >   * if the device is PCIE, return NULL
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 3a24e4f..cbcc82f 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
> > >  }
> > >  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
> > >  int pci_dev_present(const struct pci_device_id *ids);
> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			       int (*fn)(struct pci_dev *, void *), void *data);
> > >  
> > >  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
> > >  			     int where, u8 *val);
> > > 
> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date:   Wed Jul 3 12:02:15 2013 -0600
> > > 
> > >     intel-iommu-upstream-pcie
> > 
> > I assume this one would is also going to apply dma quirks to dma_ops,
> > which I'm all for.
> > 
> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > index b91564d..bdb6e6a 100644
> > > --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
> > >  	return 0;
> > >  }
> > >  
> > > +struct domain_context_info {
> > > +	struct dmar_domain *domain;
> > > +	struct pci_dev *dev;
> > > +	int translation;
> > > +	struct intel_iommu *iomumu;
> > 
> > s/iomumu/iommu/
> > 
> > > +	bool mapped;
> > > +};
> > > +
> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > 
> > dev_info->dev
> > 
> > > +	u8 bus = data->requester_id >> 8;
> > 
> > s/data->//
> > 
> > > +	u8 devfn = data->requester_id & 0xff;
> > 
> > s/data->//
> 
> I told you it wouldn't even compile :)  I'm only trying to present a
> possibility, not all the nitty-gritty details.
> 
> > > +
> > > +	return domain_context_mapping_one(data->domain, segment,
> > > +					  bus, devfn, data->translation);
> > > +}
> > > +
> > >  static int domain_context_mapping(struct dmar_domain *domain,
> > >  				  struct pci_dev *pdev, int translation)
> > >  {
> > > -	int ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > > +	struct pci_dev *bridge;
> > >  
> > > -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> > > -					 pdev->bus->number, pdev->devfn,
> > > -					 translation);
> > > -	if (ret)
> > > -		return ret;
> > > +	data.domain = domain;
> > > +	data.dev = pdev;
> > > +	data.translation = translation;
> > > +	bridge = xx;
> > 
> > Darn, I was hoping you were going to reveal where bridge comes from.
> > 
> > > +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> > > +}
> > >  
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return 0;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = domain_context_mapping_one(domain,
> > > -						 pci_domain_nr(parent->bus),
> > > -						 parent->bus->number,
> > > -						 parent->devfn, translation);
> > > -		if (ret)
> > > -			return ret;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -					pci_domain_nr(tmp->subordinate),
> > > -					tmp->subordinate->number, 0,
> > > -					translation);
> > > -	else /* this is a legacy PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -						  pci_domain_nr(tmp->bus),
> > > -						  tmp->bus->number,
> > > -						  tmp->devfn,
> > > -						  translation);
> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	bool ret;
> > > +
> > > +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> > > +	if (!ret)
> > > +		dev_info->mapped = false;
> > > +
> > > +	return 0;
> > >  }
> > >  
> > >  static bool domain_context_mapped(struct pci_dev *pdev)
> > >  {
> > > -	bool ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  	struct intel_iommu *iommu;
> > > +	struct pci_dev *bridge;
> > >  
> > >  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> > >  				pdev->devfn);
> > >  	if (!iommu)
> > >  		return false;
> > >  
> > > -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> > > -	if (!ret)
> > > -		return false;
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return true;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = device_context_mapped(iommu, parent->bus->number,
> > > -					    parent->devfn);
> > > -		if (!ret)
> > > -			return false;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp))
> > > -		return device_context_mapped(iommu, tmp->subordinate->number,
> > > -					     0);
> > > -	else
> > > -		return device_context_mapped(iommu, tmp->bus->number,
> > > -					     tmp->devfn);
> > > +	data.iommu = iommu;
> > > +	data.mapped = true;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> > > +	return data.mapped;
> > >  }
> > >  
> > >  /* Returns a number of VTD pages, but aligned to MM page size */
> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
> > >  	return NULL;
> > >  }
> > >  
> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	unsigned long flags;
> > > +	struct device_domain_info *info;
> > > +
> > > +	spin_lock_irqsave(&device_domain_lock, flags);
> > > +	list_for_each_entry(info, &device_domain_list, global) {
> > > +		if (info->segment == segment &&
> > > +		    info->bus == bus && info->devfn == devfn) {
> > > +			data->domain = info->domain;
> > > +			break;
> > > +		}
> > > +	}
> > > +	spin_unlock_irqrestore(&device_domain_lock, flags);
> > > +	return 0;
> > > +}
> > > +
> > >  /* domain is initialized */
> > >  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  {
> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  	struct intel_iommu *iommu;
> > >  	struct dmar_drhd_unit *drhd;
> > >  	struct device_domain_info *info, *tmp;
> > > -	struct pci_dev *dev_tmp;
> > >  	unsigned long flags;
> > >  	int bus = 0, devfn = 0;
> > >  	int segment;
> > >  	int ret;
> > > +	struct domain_context_info data;
> > >  
> > >  	domain = find_domain(pdev);
> > >  	if (domain)
> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  
> > >  	segment = pci_domain_nr(pdev->bus);
> > >  
> > > -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (dev_tmp) {
> > > -		if (pci_is_pcie(dev_tmp)) {
> > > -			bus = dev_tmp->subordinate->number;
> > > -			devfn = 0;
> > > -		} else {
> > > -			bus = dev_tmp->bus->number;
> > > -			devfn = dev_tmp->devfn;
> > > -		}
> > > -		spin_lock_irqsave(&device_domain_lock, flags);
> > > -		list_for_each_entry(info, &device_domain_list, global) {
> > > -			if (info->segment == segment &&
> > > -			    info->bus == bus && info->devfn == devfn) {
> > > -				found = info->domain;
> > > -				break;
> > > -			}
> > > -		}
> > > -		spin_unlock_irqrestore(&device_domain_lock, flags);
> > > -		/* pcie-pci bridge already has a domain, uses it */
> > > -		if (found) {
> > > -			domain = found;
> > > -			goto found_domain;
> > > -		}
> > > +	data.domain = NULL;
> > > +	data.dev = pdev;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> > > +	if (data.domain) {
> > > +		domain = data.domain;
> > > +		goto found_domain;
> > >  	}
> > >  
> > >  	domain = alloc_domain();
> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
> > >  	return 0;
> > >  }
> > >  
> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	struct intel_iommu *iommu = data->iommu;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +
> > > +	iommu_detach_dev(iommu, bus, devfn);
> > > +	return 0;
> > > +}
> > > +
> > >  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
> > >  					   struct pci_dev *pdev)
> > >  {
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  
> > >  	if (!iommu || !pdev)
> > >  		return;
> > >  
> > >  	/* dependent device detach */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	if (tmp) {
> > > -		parent = pdev->bus->self;
> > > -		while (parent != tmp) {
> > > -			iommu_detach_dev(iommu, parent->bus->number,
> > > -					 parent->devfn);
> > > -			parent = parent->bus->self;
> > > -		}
> > > -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -			iommu_detach_dev(iommu,
> > > -				tmp->subordinate->number, 0);
> > > -		else /* this is a legacy PCI bridge */
> > > -			iommu_detach_dev(iommu, tmp->bus->number,
> > > -					 tmp->devfn);
> > > -	}
> > > +	data.iommu = iommu;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
> > >  }
> > >  
> > >  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > 




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-09 18:27           ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2013-07-09 18:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > > > Joerg,
> > > > 
> > > > Where do we stand on this series?  You had a concern that the heuristic
> > > > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > > > issue was actually from Bjorn who replied with his rationale.  Do you
> > > > want to go in the direction of a fixed whitelist or do you agree that
> > > > even if the heuristic breaks it provides better behavior than what we
> > > > have now?  Thanks,
> > > 
> > > I'm trying to take a step back and look at the overall design, not
> > > these specific patches.
> > > 
> > > IOMMUs translate addresses based on their source, i.e., a PCIe
> > > requester ID.  This is made more complicated by the fact that some
> > > bridges "take ownership" (change the requester ID as they forward
> > > transactions upstream), as well as the fact that conventional PCI has
> > > no requester ID at all.  And some broken devices apparently generate
> > > DMA requests using the requester ID of another device.
> > > 
> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
> > > and pci_get_dma_source() interfaces, but I think there's too much
> > > assembly required by their users.  pci_find_upstream_pcie_bridge()
> > > callers normally loop through all the bridges between the "upstream
> > > PCIe bridge" and the device, checking for bridges that might take
> > > ownership.  They probably also ought to use pci_get_dma_source() to
> > > account for the broken devices, but most callers don't.
> > > 
> > > Most of this is PCI-specific stuff that should be of interest to all
> > > IOMMU drivers, and the overall structure of calls and looping should
> > > be the same for all of them, so it would be nice to factor it out
> > > somehow.
> > > 
> > > The attached patch is guaranteed not to even compile; it's just to
> > > make this idea more concrete.  The basic idea is that since the IOMMU
> > > driver wants to perform some action for each possible requester ID the
> > > IOMMU might see, PCI could provide an iterator
> > > ("pci_for_each_requester_id()") to do that.
> > > 
> > > Bjorn
> > > 
> > > 
> > > commit afad51492c6672b96c2b0735600d5695e30f7180
> > > Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > Date:   Wed Jul 3 16:04:26 2013 -0600
> > > 
> > >     pci-add-for-each-requester-id
> > > 
> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > > index d0627fa..380eb03 100644
> > > --- a/drivers/pci/search.c
> > > +++ b/drivers/pci/search.c
> > > @@ -17,6 +17,89 @@
> > >  DECLARE_RWSEM(pci_bus_sem);
> > >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> > >  
> > > +#define PCI_REQUESTER_ID(dev)	(((dev)->bus->number << 8) | (dev)->devfn)
> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge)	((bridge)->subordinate->number << 8)
> > > +
> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
> > > +{
> > > +	return !!pci_pcix_cap(dev);	/* XXX not implemented */
> > > +}
> > > +
> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> > > +{
> > > +	/*
> > > +	 * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> > > +	 * Spec v1.0, sec 2.3.
> > > +	 */
> > > +	if (pci_is_pcie(bridge) &&
> > > +	    pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> > > +		return true;
> > 
> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
> > capability are exactly what causes the current bug.
> 
> Likely so.  We might be able to identify that as "secondary bus is not
> PCIe" or something.
> 
> > > +
> > > +	/*
> > > +	 * A PCI-X to PCI-X bridge need not take ownership because there
> > > +	 * are requester IDs on the secondary PCI-X bus.  However, if a PCI
> > > +	 * device is added on the secondary bus, the bridge must revert to
> > > +	 * being a PCI-X to PCI bridge, and it then *would* take ownership.
> > > +	 * Assuming a PCI-X to PCI-X bridge takes ownership means we can
> > > +	 * tolerate a future hot-add without having to change existing
> > > +	 * IOMMU mappings.
> > > +	 */
> > > +	if (pci_is_pcix(bridge))
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> > > +					 struct pci_dev *dev)
> > > +{
> > > +	struct pci_dev *bridge;
> > > +	struct pci_bus *child_bus;
> > > +	u8 secondary, subordinate, busn = dev->bus->number;
> > > +
> > > +	if (dev->bus == bus)
> > > +		return dev;
> > > +
> > > +	/*
> > > +	 * There may be several devices on "bus".  Find the one that is a
> > > +	 * bridge leading to "dev".
> > > +	 */
> > > +	list_for_each_entry(bridge, &bus->devices, bus_list) {
> > > +		child_bus = bridge->subordinate;
> > > +		if (child_bus) {
> > > +			secondary = child_bus->busn_res.start;
> > > +			subordinate = child_bus->busn_res.end;
> > > +			if (secondary <= busn && busn <= subordinate)
> > > +				return bridge;
> > > +		}
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			      int (*fn)(struct pci_dev *, void *),
> > 
> > u16 requester_id
> > 
> > > +			      void *data)
> > > +{
> > > +	int ret;
> > > +
> > > +	dev = pci_get_dma_source(dev);	/* XXX ref count screwup */
> > 
> > It's unfortunate that our dma quirk still seems to only get called for
> > the end device where we could have bridges that need it too.
> 
> Sure, we can do that if necessary.
> 
> I'm not trying to present this as any kind of finished solution.  The
> point is to figure out if we can make an interface that's a little
> more abstract so we can pull the PCI topology issues out of the IOMMU
> drivers, and I just thought something that looked like code would be
> more concrete and easier to discuss.
> 
> > > +
> > > +	while (bridge != dev) {
> > 
> > 
> > I'm having a hard time understanding this function since bridge is never
> > defined in any of the below sample code.  Would bridge be the root port
> > above a device?  And what about devices connected to the RC?  Seems like
> > we'd need to traverse up the bus first, so there's a bit of missing
> > magic.
> 
> I was assuming that an IOMMU could be either at the root of a PCI
> hierarchy or buried farther down.  If that's true, the IOMMU driver
> might need to supply the top of the PCI hierarchy for which it
> translates.  That's what I'm trying to get at with the "bridge"
> argument.
> 
> If the IOMMU is buried, I assumed there was no point in iterating over
> bridges upstream of the IOMMU because it wouldn't see requester IDs of
> those upstream bridges.  PCI wouldn't have any way to figure out how
> far up to go, hence the argument.

I don't think that getting the bridge is going to be quite so simple.
VT-d can have multiple IOMMUs, each handling a specific device or
hierarchy.  One IOMMU per domain can be default IOMMU for any device not
otherwise specified.  The IOMMU is not a PCI enumerable device.  AMD-Vi
does have a PCI visible IOMMU, but it's an endpoint on the root complex,
not the top of a hierarchy.

I'm warming up to your approach, but I think I'd rather see something
like:

struct pci_dev *pci_get_requester_id(struct pci_dev *pdev)
{
	do {
		/* Apply DMA quirks at every step */
		pdev = pci_get_dma_source(pdev);

		/* Done for PCIe devices or broken bridges, PCI-X TBD */
		if (pci_is_pcie(pdev) || /* broken pcie bridge test */)
			break;

	} while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self));

	return pdev;
}


/* Start with requester ID pdev and pdev->bus, this way we handle both peer
 * and parent requester IDs */
int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus,
				     int (*fn)(struct pci_dev *, void *), void *data)
{
	struct pci_dev *tmp;
	int ret;

	list_for_each_entry(tmp, &bus->devices, bus_list) {
		if (pci_get_requester_id(tmp) == pdev) {
			ret = fn(tmp, data);
			if (ret)
				...

			if (tmp->subordinate) {
				ret = pci_for_each_requester_id(pdev, tmp->subordinate, fn);
				if (ret)
					...
			}
		}
	}
	
	return ret;
}

Seems like that would be sufficient to leverage the intel-iommu changes
you've drafted.  Thanks,

Alex

> > > +		if (pci_bridge_may_take_ownership(bridge)) {
> > > +			ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > > +		bridge = pci_bridge_to_dev(bridge->subordinate, dev);
> > > +	}
> > > +
> > > +	if (pci_is_pcie(dev) || pci_is_pcix(dev))
> > > +		return fn(dev, PCI_REQUESTER_ID(dev), data);
> > > +
> > > +	/* Conventional PCI has no requester ID */
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * find the upstream PCIe-to-PCI bridge of a PCI device
> > >   * if the device is PCIE, return NULL
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 3a24e4f..cbcc82f 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
> > >  }
> > >  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
> > >  int pci_dev_present(const struct pci_device_id *ids);
> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
> > > +			       int (*fn)(struct pci_dev *, void *), void *data);
> > >  
> > >  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
> > >  			     int where, u8 *val);
> > > 
> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
> > > Author: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > Date:   Wed Jul 3 12:02:15 2013 -0600
> > > 
> > >     intel-iommu-upstream-pcie
> > 
> > I assume this one would is also going to apply dma quirks to dma_ops,
> > which I'm all for.
> > 
> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > index b91564d..bdb6e6a 100644
> > > --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
> > >  	return 0;
> > >  }
> > >  
> > > +struct domain_context_info {
> > > +	struct dmar_domain *domain;
> > > +	struct pci_dev *dev;
> > > +	int translation;
> > > +	struct intel_iommu *iomumu;
> > 
> > s/iomumu/iommu/
> > 
> > > +	bool mapped;
> > > +};
> > > +
> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > 
> > dev_info->dev
> > 
> > > +	u8 bus = data->requester_id >> 8;
> > 
> > s/data->//
> > 
> > > +	u8 devfn = data->requester_id & 0xff;
> > 
> > s/data->//
> 
> I told you it wouldn't even compile :)  I'm only trying to present a
> possibility, not all the nitty-gritty details.
> 
> > > +
> > > +	return domain_context_mapping_one(data->domain, segment,
> > > +					  bus, devfn, data->translation);
> > > +}
> > > +
> > >  static int domain_context_mapping(struct dmar_domain *domain,
> > >  				  struct pci_dev *pdev, int translation)
> > >  {
> > > -	int ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > > +	struct pci_dev *bridge;
> > >  
> > > -	ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
> > > -					 pdev->bus->number, pdev->devfn,
> > > -					 translation);
> > > -	if (ret)
> > > -		return ret;
> > > +	data.domain = domain;
> > > +	data.dev = pdev;
> > > +	data.translation = translation;
> > > +	bridge = xx;
> > 
> > Darn, I was hoping you were going to reveal where bridge comes from.
> > 
> > > +	return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
> > > +}
> > >  
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return 0;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = domain_context_mapping_one(domain,
> > > -						 pci_domain_nr(parent->bus),
> > > -						 parent->bus->number,
> > > -						 parent->devfn, translation);
> > > -		if (ret)
> > > -			return ret;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -					pci_domain_nr(tmp->subordinate),
> > > -					tmp->subordinate->number, 0,
> > > -					translation);
> > > -	else /* this is a legacy PCI bridge */
> > > -		return domain_context_mapping_one(domain,
> > > -						  pci_domain_nr(tmp->bus),
> > > -						  tmp->bus->number,
> > > -						  tmp->devfn,
> > > -						  translation);
> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	bool ret;
> > > +
> > > +	ret = device_context_mapped(dev_info->iommu, bus, devfn);
> > > +	if (!ret)
> > > +		dev_info->mapped = false;
> > > +
> > > +	return 0;
> > >  }
> > >  
> > >  static bool domain_context_mapped(struct pci_dev *pdev)
> > >  {
> > > -	bool ret;
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  	struct intel_iommu *iommu;
> > > +	struct pci_dev *bridge;
> > >  
> > >  	iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> > >  				pdev->devfn);
> > >  	if (!iommu)
> > >  		return false;
> > >  
> > > -	ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
> > > -	if (!ret)
> > > -		return false;
> > > -	/* dependent device mapping */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (!tmp)
> > > -		return true;
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	parent = pdev->bus->self;
> > > -	while (parent != tmp) {
> > > -		ret = device_context_mapped(iommu, parent->bus->number,
> > > -					    parent->devfn);
> > > -		if (!ret)
> > > -			return false;
> > > -		parent = parent->bus->self;
> > > -	}
> > > -	if (pci_is_pcie(tmp))
> > > -		return device_context_mapped(iommu, tmp->subordinate->number,
> > > -					     0);
> > > -	else
> > > -		return device_context_mapped(iommu, tmp->bus->number,
> > > -					     tmp->devfn);
> > > +	data.iommu = iommu;
> > > +	data.mapped = true;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
> > > +	return data.mapped;
> > >  }
> > >  
> > >  /* Returns a number of VTD pages, but aligned to MM page size */
> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
> > >  	return NULL;
> > >  }
> > >  
> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	int segment = pci_domain_nr(data->dev);
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +	unsigned long flags;
> > > +	struct device_domain_info *info;
> > > +
> > > +	spin_lock_irqsave(&device_domain_lock, flags);
> > > +	list_for_each_entry(info, &device_domain_list, global) {
> > > +		if (info->segment == segment &&
> > > +		    info->bus == bus && info->devfn == devfn) {
> > > +			data->domain = info->domain;
> > > +			break;
> > > +		}
> > > +	}
> > > +	spin_unlock_irqrestore(&device_domain_lock, flags);
> > > +	return 0;
> > > +}
> > > +
> > >  /* domain is initialized */
> > >  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  {
> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  	struct intel_iommu *iommu;
> > >  	struct dmar_drhd_unit *drhd;
> > >  	struct device_domain_info *info, *tmp;
> > > -	struct pci_dev *dev_tmp;
> > >  	unsigned long flags;
> > >  	int bus = 0, devfn = 0;
> > >  	int segment;
> > >  	int ret;
> > > +	struct domain_context_info data;
> > >  
> > >  	domain = find_domain(pdev);
> > >  	if (domain)
> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
> > >  
> > >  	segment = pci_domain_nr(pdev->bus);
> > >  
> > > -	dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	if (dev_tmp) {
> > > -		if (pci_is_pcie(dev_tmp)) {
> > > -			bus = dev_tmp->subordinate->number;
> > > -			devfn = 0;
> > > -		} else {
> > > -			bus = dev_tmp->bus->number;
> > > -			devfn = dev_tmp->devfn;
> > > -		}
> > > -		spin_lock_irqsave(&device_domain_lock, flags);
> > > -		list_for_each_entry(info, &device_domain_list, global) {
> > > -			if (info->segment == segment &&
> > > -			    info->bus == bus && info->devfn == devfn) {
> > > -				found = info->domain;
> > > -				break;
> > > -			}
> > > -		}
> > > -		spin_unlock_irqrestore(&device_domain_lock, flags);
> > > -		/* pcie-pci bridge already has a domain, uses it */
> > > -		if (found) {
> > > -			domain = found;
> > > -			goto found_domain;
> > > -		}
> > > +	data.domain = NULL;
> > > +	data.dev = pdev;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
> > > +	if (data.domain) {
> > > +		domain = data.domain;
> > > +		goto found_domain;
> > >  	}
> > >  
> > >  	domain = alloc_domain();
> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
> > >  	return 0;
> > >  }
> > >  
> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
> > > +{
> > > +	struct domain_context_info *dev_info = data;
> > > +	struct intel_iommu *iommu = data->iommu;
> > > +	u8 bus = data->requester_id >> 8;
> > > +	u8 devfn = data->requester_id & 0xff;
> > > +
> > > +	iommu_detach_dev(iommu, bus, devfn);
> > > +	return 0;
> > > +}
> > > +
> > >  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
> > >  					   struct pci_dev *pdev)
> > >  {
> > > -	struct pci_dev *tmp, *parent;
> > > +	struct domain_context_info data;
> > >  
> > >  	if (!iommu || !pdev)
> > >  		return;
> > >  
> > >  	/* dependent device detach */
> > > -	tmp = pci_find_upstream_pcie_bridge(pdev);
> > > -	/* Secondary interface's bus number and devfn 0 */
> > > -	if (tmp) {
> > > -		parent = pdev->bus->self;
> > > -		while (parent != tmp) {
> > > -			iommu_detach_dev(iommu, parent->bus->number,
> > > -					 parent->devfn);
> > > -			parent = parent->bus->self;
> > > -		}
> > > -		if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > > -			iommu_detach_dev(iommu,
> > > -				tmp->subordinate->number, 0);
> > > -		else /* this is a legacy PCI bridge */
> > > -			iommu_detach_dev(iommu, tmp->bus->number,
> > > -					 tmp->devfn);
> > > -	}
> > > +	data.iommu = iommu;
> > > +	bridge = xx;
> > > +	pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
> > >  }
> > >  
> > >  static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
  2013-07-09 18:27           ` Alex Williamson
  (?)
@ 2013-07-09 20:10           ` Bjorn Helgaas
  -1 siblings, 0 replies; 34+ messages in thread
From: Bjorn Helgaas @ 2013-07-09 20:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: open list:INTEL IOMMU (VT-d),
	David Woodhouse, Joerg Roedel, Stephen Hemminger, linux-pci,
	Don Dutile

On Tue, Jul 9, 2013 at 12:27 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
>> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
>> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
>> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
>> > > > Joerg,
>> > > >
>> > > > Where do we stand on this series?  You had a concern that the heuristic
>> > > > used in patch 1/ could be dangerous.  The suggestion for detecting the
>> > > > issue was actually from Bjorn who replied with his rationale.  Do you
>> > > > want to go in the direction of a fixed whitelist or do you agree that
>> > > > even if the heuristic breaks it provides better behavior than what we
>> > > > have now?  Thanks,
>> > >
>> > > I'm trying to take a step back and look at the overall design, not
>> > > these specific patches.
>> > >
>> > > IOMMUs translate addresses based on their source, i.e., a PCIe
>> > > requester ID.  This is made more complicated by the fact that some
>> > > bridges "take ownership" (change the requester ID as they forward
>> > > transactions upstream), as well as the fact that conventional PCI has
>> > > no requester ID at all.  And some broken devices apparently generate
>> > > DMA requests using the requester ID of another device.
>> > >
>> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
>> > > and pci_get_dma_source() interfaces, but I think there's too much
>> > > assembly required by their users.  pci_find_upstream_pcie_bridge()
>> > > callers normally loop through all the bridges between the "upstream
>> > > PCIe bridge" and the device, checking for bridges that might take
>> > > ownership.  They probably also ought to use pci_get_dma_source() to
>> > > account for the broken devices, but most callers don't.
>> > >
>> > > Most of this is PCI-specific stuff that should be of interest to all
>> > > IOMMU drivers, and the overall structure of calls and looping should
>> > > be the same for all of them, so it would be nice to factor it out
>> > > somehow.
>> > >
>> > > The attached patch is guaranteed not to even compile; it's just to
>> > > make this idea more concrete.  The basic idea is that since the IOMMU
>> > > driver wants to perform some action for each possible requester ID the
>> > > IOMMU might see, PCI could provide an iterator
>> > > ("pci_for_each_requester_id()") to do that.
>> > >
>> > > Bjorn
>> > >
>> > >
>> > > commit afad51492c6672b96c2b0735600d5695e30f7180
>> > > Author: Bjorn Helgaas <bhelgaas@google.com>
>> > > Date:   Wed Jul 3 16:04:26 2013 -0600
>> > >
>> > >     pci-add-for-each-requester-id
>> > >
>> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> > > index d0627fa..380eb03 100644
>> > > --- a/drivers/pci/search.c
>> > > +++ b/drivers/pci/search.c
>> > > @@ -17,6 +17,89 @@
>> > >  DECLARE_RWSEM(pci_bus_sem);
>> > >  EXPORT_SYMBOL_GPL(pci_bus_sem);
>> > >
>> > > +#define PCI_REQUESTER_ID(dev)    (((dev)->bus->number << 8) | (dev)->devfn)
>> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge)  ((bridge)->subordinate->number << 8)
>> > > +
>> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
>> > > +{
>> > > + return !!pci_pcix_cap(dev);     /* XXX not implemented */
>> > > +}
>> > > +
>> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
>> > > +{
>> > > + /*
>> > > +  * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
>> > > +  * Spec v1.0, sec 2.3.
>> > > +  */
>> > > + if (pci_is_pcie(bridge) &&
>> > > +     pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
>> > > +         return true;
>> >
>> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
>> > capability are exactly what causes the current bug.
>>
>> Likely so.  We might be able to identify that as "secondary bus is not
>> PCIe" or something.
>>
>> > > +
>> > > + /*
>> > > +  * A PCI-X to PCI-X bridge need not take ownership because there
>> > > +  * are requester IDs on the secondary PCI-X bus.  However, if a PCI
>> > > +  * device is added on the secondary bus, the bridge must revert to
>> > > +  * being a PCI-X to PCI bridge, and it then *would* take ownership.
>> > > +  * Assuming a PCI-X to PCI-X bridge takes ownership means we can
>> > > +  * tolerate a future hot-add without having to change existing
>> > > +  * IOMMU mappings.
>> > > +  */
>> > > + if (pci_is_pcix(bridge))
>> > > +         return true;
>> > > +
>> > > + return false;
>> > > +}
>> > > +
>> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
>> > > +                                  struct pci_dev *dev)
>> > > +{
>> > > + struct pci_dev *bridge;
>> > > + struct pci_bus *child_bus;
>> > > + u8 secondary, subordinate, busn = dev->bus->number;
>> > > +
>> > > + if (dev->bus == bus)
>> > > +         return dev;
>> > > +
>> > > + /*
>> > > +  * There may be several devices on "bus".  Find the one that is a
>> > > +  * bridge leading to "dev".
>> > > +  */
>> > > + list_for_each_entry(bridge, &bus->devices, bus_list) {
>> > > +         child_bus = bridge->subordinate;
>> > > +         if (child_bus) {
>> > > +                 secondary = child_bus->busn_res.start;
>> > > +                 subordinate = child_bus->busn_res.end;
>> > > +                 if (secondary <= busn && busn <= subordinate)
>> > > +                         return bridge;
>> > > +         }
>> > > + }
>> > > + return NULL;
>> > > +}
>> > > +
>> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
>> > > +                       int (*fn)(struct pci_dev *, void *),
>> >
>> > u16 requester_id
>> >
>> > > +                       void *data)
>> > > +{
>> > > + int ret;
>> > > +
>> > > + dev = pci_get_dma_source(dev);  /* XXX ref count screwup */
>> >
>> > It's unfortunate that our dma quirk still seems to only get called for
>> > the end device where we could have bridges that need it too.
>>
>> Sure, we can do that if necessary.
>>
>> I'm not trying to present this as any kind of finished solution.  The
>> point is to figure out if we can make an interface that's a little
>> more abstract so we can pull the PCI topology issues out of the IOMMU
>> drivers, and I just thought something that looked like code would be
>> more concrete and easier to discuss.
>>
>> > > +
>> > > + while (bridge != dev) {
>> >
>> >
>> > I'm having a hard time understanding this function since bridge is never
>> > defined in any of the below sample code.  Would bridge be the root port
>> > above a device?  And what about devices connected to the RC?  Seems like
>> > we'd need to traverse up the bus first, so there's a bit of missing
>> > magic.
>>
>> I was assuming that an IOMMU could be either at the root of a PCI
>> hierarchy or buried farther down.  If that's true, the IOMMU driver
>> might need to supply the top of the PCI hierarchy for which it
>> translates.  That's what I'm trying to get at with the "bridge"
>> argument.
>>
>> If the IOMMU is buried, I assumed there was no point in iterating over
>> bridges upstream of the IOMMU because it wouldn't see requester IDs of
>> those upstream bridges.  PCI wouldn't have any way to figure out how
>> far up to go, hence the argument.
>
> I don't think that getting the bridge is going to be quite so simple.
> VT-d can have multiple IOMMUs, each handling a specific device or
> hierarchy.  One IOMMU per domain can be default IOMMU for any device not
> otherwise specified.  The IOMMU is not a PCI enumerable device.  AMD-Vi
> does have a PCI visible IOMMU, but it's an endpoint on the root complex,
> not the top of a hierarchy.

If the only possible topology is that the IOMMU sees transactions
coming out of a device on a root bus, then a bridge parameter is
unnecessary, and we can just look at every bridge between the root bus
and the device.  I just didn't know whether that was the only possible
topology.

If an IOMMU can be placed such that it's like a PCIe switch that
happens to translate DMA addresses before forwarding upstream, then we
might want a bridge parameter that identifies the "downstream port" of
the IOMMU or the upstream port of a switch below the IOMMU.  Otherwise
we would generate requester IDs unnecessarily for bridges above the
IOMMU.

> I'm warming up to your approach, but I think I'd rather see something
> like:
>
> struct pci_dev *pci_get_requester_id(struct pci_dev *pdev)

This returns a "pci_dev *", not a requester ID, so for bridges that
can take ownership, this depends on there being a pci_dev on the
secondary bus with dev/fn == 0.  Maybe that's always true; I'm not
sure.

> {
>         do {
>                 /* Apply DMA quirks at every step */
>                 pdev = pci_get_dma_source(pdev);
>
>                 /* Done for PCIe devices or broken bridges, PCI-X TBD */
>                 if (pci_is_pcie(pdev) || /* broken pcie bridge test */)
>                         break;
>
>         } while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self));
>
>         return pdev;

I'm not sure it's sufficient to stop when we find a PCIe bridge.  For
example, in this topology:

  00:1c.0 PCIe root port
  01:00.0 PCIe-to-PCI bridge
  02:00.0 PCI-to-PCIe reverse bridge
  03:00.0 PCIe switch upstream port
  04:00.0 PCIe switch downstream port
  05:00.0 PCIe endpoint

the IOMMU will only see requester IDs of 02:00.0, even though if we
search up from 05:00.0 we find a PCIe device at 04:00.0.  Maybe this
is too contrived, but reverse bridges are sometimes used on add-in
cards so a new part can be used in older systems, e.g., in this case
the add-in card would contain 02:00.0 and everything below it.

> }
>
>
> /* Start with requester ID pdev and pdev->bus, this way we handle both peer
>  * and parent requester IDs */
> int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus,
>                                      int (*fn)(struct pci_dev *, void *), void *data)
> {
>         struct pci_dev *tmp;
>         int ret;
>
>         list_for_each_entry(tmp, &bus->devices, bus_list) {
>                 if (pci_get_requester_id(tmp) == pdev) {
>                         ret = fn(tmp, data);

What does the IOMMU driver supply for "bus"?  Presumably it doesn't
supply "pdev->bus" because you're traversing downward
("tmp->subordinate" below).

>                         if (ret)
>                                 ...
>
>                         if (tmp->subordinate) {
>                                 ret = pci_for_each_requester_id(pdev, tmp->subordinate, fn);
>                                 if (ret)
>                                         ...
>                         }
>                 }
>         }
>
>         return ret;
> }
>
> Seems like that would be sufficient to leverage the intel-iommu changes
> you've drafted.  Thanks,
>
> Alex
>
>> > > +         if (pci_bridge_may_take_ownership(bridge)) {
>> > > +                 ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
>> > > +                 if (ret)
>> > > +                         return ret;
>> > > +         }
>> > > +         bridge = pci_bridge_to_dev(bridge->subordinate, dev);
>> > > + }
>> > > +
>> > > + if (pci_is_pcie(dev) || pci_is_pcix(dev))
>> > > +         return fn(dev, PCI_REQUESTER_ID(dev), data);
>> > > +
>> > > + /* Conventional PCI has no requester ID */
>> > > + return 0;
>> > > +}
>> > > +
>> > >  /*
>> > >   * find the upstream PCIe-to-PCI bridge of a PCI device
>> > >   * if the device is PCIE, return NULL
>> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
>> > > index 3a24e4f..cbcc82f 100644
>> > > --- a/include/linux/pci.h
>> > > +++ b/include/linux/pci.h
>> > > @@ -794,6 +794,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
>> > >  }
>> > >  struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
>> > >  int pci_dev_present(const struct pci_device_id *ids);
>> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
>> > > +                        int (*fn)(struct pci_dev *, void *), void *data);
>> > >
>> > >  int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
>> > >                        int where, u8 *val);
>> > >
>> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
>> > > Author: Bjorn Helgaas <bhelgaas@google.com>
>> > > Date:   Wed Jul 3 12:02:15 2013 -0600
>> > >
>> > >     intel-iommu-upstream-pcie
>> >
>> > I assume this one would is also going to apply dma quirks to dma_ops,
>> > which I'm all for.
>> >
>> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> > > index b91564d..bdb6e6a 100644
>> > > --- a/drivers/iommu/intel-iommu.c
>> > > +++ b/drivers/iommu/intel-iommu.c
>> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
>> > >   return 0;
>> > >  }
>> > >
>> > > +struct domain_context_info {
>> > > + struct dmar_domain *domain;
>> > > + struct pci_dev *dev;
>> > > + int translation;
>> > > + struct intel_iommu *iomumu;
>> >
>> > s/iomumu/iommu/
>> >
>> > > + bool mapped;
>> > > +};
>> > > +
>> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + int segment = pci_domain_nr(data->dev);
>> >
>> > dev_info->dev
>> >
>> > > + u8 bus = data->requester_id >> 8;
>> >
>> > s/data->//
>> >
>> > > + u8 devfn = data->requester_id & 0xff;
>> >
>> > s/data->//
>>
>> I told you it wouldn't even compile :)  I'm only trying to present a
>> possibility, not all the nitty-gritty details.
>>
>> > > +
>> > > + return domain_context_mapping_one(data->domain, segment,
>> > > +                                   bus, devfn, data->translation);
>> > > +}
>> > > +
>> > >  static int domain_context_mapping(struct dmar_domain *domain,
>> > >                             struct pci_dev *pdev, int translation)
>> > >  {
>> > > - int ret;
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > > + struct pci_dev *bridge;
>> > >
>> > > - ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
>> > > -                                  pdev->bus->number, pdev->devfn,
>> > > -                                  translation);
>> > > - if (ret)
>> > > -         return ret;
>> > > + data.domain = domain;
>> > > + data.dev = pdev;
>> > > + data.translation = translation;
>> > > + bridge = xx;
>> >
>> > Darn, I was hoping you were going to reveal where bridge comes from.
>> >
>> > > + return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
>> > > +}
>> > >
>> > > - /* dependent device mapping */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (!tmp)
>> > > -         return 0;
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - parent = pdev->bus->self;
>> > > - while (parent != tmp) {
>> > > -         ret = domain_context_mapping_one(domain,
>> > > -                                          pci_domain_nr(parent->bus),
>> > > -                                          parent->bus->number,
>> > > -                                          parent->devfn, translation);
>> > > -         if (ret)
>> > > -                 return ret;
>> > > -         parent = parent->bus->self;
>> > > - }
>> > > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
>> > > -         return domain_context_mapping_one(domain,
>> > > -                                 pci_domain_nr(tmp->subordinate),
>> > > -                                 tmp->subordinate->number, 0,
>> > > -                                 translation);
>> > > - else /* this is a legacy PCI bridge */
>> > > -         return domain_context_mapping_one(domain,
>> > > -                                           pci_domain_nr(tmp->bus),
>> > > -                                           tmp->bus->number,
>> > > -                                           tmp->devfn,
>> > > -                                           translation);
>> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id, void *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > + bool ret;
>> > > +
>> > > + ret = device_context_mapped(dev_info->iommu, bus, devfn);
>> > > + if (!ret)
>> > > +         dev_info->mapped = false;
>> > > +
>> > > + return 0;
>> > >  }
>> > >
>> > >  static bool domain_context_mapped(struct pci_dev *pdev)
>> > >  {
>> > > - bool ret;
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > >   struct intel_iommu *iommu;
>> > > + struct pci_dev *bridge;
>> > >
>> > >   iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
>> > >                           pdev->devfn);
>> > >   if (!iommu)
>> > >           return false;
>> > >
>> > > - ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
>> > > - if (!ret)
>> > > -         return false;
>> > > - /* dependent device mapping */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (!tmp)
>> > > -         return true;
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - parent = pdev->bus->self;
>> > > - while (parent != tmp) {
>> > > -         ret = device_context_mapped(iommu, parent->bus->number,
>> > > -                                     parent->devfn);
>> > > -         if (!ret)
>> > > -                 return false;
>> > > -         parent = parent->bus->self;
>> > > - }
>> > > - if (pci_is_pcie(tmp))
>> > > -         return device_context_mapped(iommu, tmp->subordinate->number,
>> > > -                                      0);
>> > > - else
>> > > -         return device_context_mapped(iommu, tmp->bus->number,
>> > > -                                      tmp->devfn);
>> > > + data.iommu = iommu;
>> > > + data.mapped = true;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
>> > > + return data.mapped;
>> > >  }
>> > >
>> > >  /* Returns a number of VTD pages, but aligned to MM page size */
>> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct pci_dev *pdev)
>> > >   return NULL;
>> > >  }
>> > >
>> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + int segment = pci_domain_nr(data->dev);
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > + unsigned long flags;
>> > > + struct device_domain_info *info;
>> > > +
>> > > + spin_lock_irqsave(&device_domain_lock, flags);
>> > > + list_for_each_entry(info, &device_domain_list, global) {
>> > > +         if (info->segment == segment &&
>> > > +             info->bus == bus && info->devfn == devfn) {
>> > > +                 data->domain = info->domain;
>> > > +                 break;
>> > > +         }
>> > > + }
>> > > + spin_unlock_irqrestore(&device_domain_lock, flags);
>> > > + return 0;
>> > > +}
>> > > +
>> > >  /* domain is initialized */
>> > >  static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>> > >  {
>> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>> > >   struct intel_iommu *iommu;
>> > >   struct dmar_drhd_unit *drhd;
>> > >   struct device_domain_info *info, *tmp;
>> > > - struct pci_dev *dev_tmp;
>> > >   unsigned long flags;
>> > >   int bus = 0, devfn = 0;
>> > >   int segment;
>> > >   int ret;
>> > > + struct domain_context_info data;
>> > >
>> > >   domain = find_domain(pdev);
>> > >   if (domain)
>> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>> > >
>> > >   segment = pci_domain_nr(pdev->bus);
>> > >
>> > > - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (dev_tmp) {
>> > > -         if (pci_is_pcie(dev_tmp)) {
>> > > -                 bus = dev_tmp->subordinate->number;
>> > > -                 devfn = 0;
>> > > -         } else {
>> > > -                 bus = dev_tmp->bus->number;
>> > > -                 devfn = dev_tmp->devfn;
>> > > -         }
>> > > -         spin_lock_irqsave(&device_domain_lock, flags);
>> > > -         list_for_each_entry(info, &device_domain_list, global) {
>> > > -                 if (info->segment == segment &&
>> > > -                     info->bus == bus && info->devfn == devfn) {
>> > > -                         found = info->domain;
>> > > -                         break;
>> > > -                 }
>> > > -         }
>> > > -         spin_unlock_irqrestore(&device_domain_lock, flags);
>> > > -         /* pcie-pci bridge already has a domain, uses it */
>> > > -         if (found) {
>> > > -                 domain = found;
>> > > -                 goto found_domain;
>> > > -         }
>> > > + data.domain = NULL;
>> > > + data.dev = pdev;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
>> > > + if (data.domain) {
>> > > +         domain = data.domain;
>> > > +         goto found_domain;
>> > >   }
>> > >
>> > >   domain = alloc_domain();
>> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
>> > >   return 0;
>> > >  }
>> > >
>> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + struct intel_iommu *iommu = data->iommu;
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > +
>> > > + iommu_detach_dev(iommu, bus, devfn);
>> > > + return 0;
>> > > +}
>> > > +
>> > >  static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
>> > >                                      struct pci_dev *pdev)
>> > >  {
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > >
>> > >   if (!iommu || !pdev)
>> > >           return;
>> > >
>> > >   /* dependent device detach */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - if (tmp) {
>> > > -         parent = pdev->bus->self;
>> > > -         while (parent != tmp) {
>> > > -                 iommu_detach_dev(iommu, parent->bus->number,
>> > > -                                  parent->devfn);
>> > > -                 parent = parent->bus->self;
>> > > -         }
>> > > -         if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
>> > > -                 iommu_detach_dev(iommu,
>> > > -                         tmp->subordinate->number, 0);
>> > > -         else /* this is a legacy PCI bridge */
>> > > -                 iommu_detach_dev(iommu, tmp->bus->number,
>> > > -                                  tmp->devfn);
>> > > - }
>> > > + data.iommu = iommu;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
>> > >  }
>> > >
>> > >  static void domain_remove_one_dev_info(struct dmar_domain *domain,
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> > > the body of a message to majordomo@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-30 11:52     ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-07-30 11:52 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, stephen, linux-pci, ddutile

(sorry for the delay)

On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> Where do we stand on this series?  You had a concern that the heuristic
> used in patch 1/ could be dangerous.  The suggestion for detecting the
> issue was actually from Bjorn who replied with his rationale.  Do you
> want to go in the direction of a fixed whitelist or do you agree that
> even if the heuristic breaks it provides better behavior than what we
> have now?  Thanks,

So if this workaround should still live in the IOMMU code (after your
discussion with Bjorn) you can either convince me that the heuristic
will never fail or you resend a version that uses a list of known to be
broken bridges to match against.


	Joerg



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges
@ 2013-07-30 11:52     ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2013-07-30 11:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ

(sorry for the delay)

On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> Where do we stand on this series?  You had a concern that the heuristic
> used in patch 1/ could be dangerous.  The suggestion for detecting the
> issue was actually from Bjorn who replied with his rationale.  Do you
> want to go in the direction of a fixed whitelist or do you agree that
> even if the heuristic breaks it provides better behavior than what we
> have now?  Thanks,

So if this workaround should still live in the IOMMU code (after your
discussion with Bjorn) you can either convince me that the heuristic
will never fail or you resend a version that uses a list of known to be
broken bridges to match against.


	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-07-30 11:53 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-28 18:40 [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges Alex Williamson
2013-05-28 18:40 ` [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function Alex Williamson
2013-05-28 18:40   ` Alex Williamson
2013-05-28 19:38   ` Stephen Hemminger
2013-05-28 19:53     ` Alex Williamson
2013-05-28 19:56       ` Stephen Hemminger
2013-05-28 20:15         ` Alex Williamson
2013-05-28 20:28           ` Stephen Hemminger
2013-06-20 13:59   ` Joerg Roedel
2013-06-20 13:59     ` Joerg Roedel
2013-06-20 15:44     ` Alex Williamson
2013-06-20 16:15       ` Joerg Roedel
2013-06-20 16:15         ` Joerg Roedel
2013-06-26  4:20         ` Bjorn Helgaas
2013-06-26 18:45           ` Alex Williamson
2013-06-26 18:45             ` Alex Williamson
2013-06-26 19:11             ` Bjorn Helgaas
2013-05-28 18:40 ` [PATCH v2 2/2] intel-iommu: Convert to iommu_pci_find_upstream + iommu_pci_is_pcie_bridge Alex Williamson
2013-05-28 22:09 ` [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges Bjorn Helgaas
2013-05-28 22:09   ` Bjorn Helgaas
2013-05-28 22:53 ` [PATCH v2 3/2] pci: Remove unused pci_find_upstream_pcie_bridge() Alex Williamson
2013-05-28 22:53   ` Alex Williamson
2013-07-08 17:07 ` [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges Alex Williamson
2013-07-08 17:07   ` Alex Williamson
2013-07-08 19:34   ` Bjorn Helgaas
2013-07-08 19:34     ` Bjorn Helgaas
2013-07-08 20:49     ` Alex Williamson
2013-07-08 20:49       ` Alex Williamson
2013-07-08 21:51       ` Bjorn Helgaas
2013-07-09 18:27         ` Alex Williamson
2013-07-09 18:27           ` Alex Williamson
2013-07-09 20:10           ` Bjorn Helgaas
2013-07-30 11:52   ` Joerg Roedel
2013-07-30 11:52     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.