All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-03 13:15 ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: Jon Masters, Robin Murphy, Jayachandran C, linux-arm-kernel

Hi Bjorn, Alex,

Sending this again (with a trivial fix to author name), please review.
Updated summary below:

Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
ThunderX2 systems (previously known as Broadcom Vulcan).

The earlier discussions on this can be seen at:
http://www.spinics.net/lists/linux-pci/msg51001.html
https://patchwork.ozlabs.org/patch/582633/ and
https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html

The earlier discussion on this patchset ended with a suggestion that it
may be possible to fix up this quirk by handling the issue in the
function argument of pci_for_each_dma_alias(). But at that point we did
not have the codebase to make the changes since the full ACPI and OF code
for SMMU and GIC ITS was not upstream.

Now that the changes are upstream, I tried to fix it in both the SMMU
and the GIC ITS code based on this suggestion, the changes needed are at:
 https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup

The problems with this approach are:
 - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
   tree, I have to fixup 6 callers (which is all but one ofthe callers
   outside x86)
 - 4 of these can be reasonably handled (please see the github repo above),
   but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
   drivers/iommu/iommu.c cannot be reasonably fixed up.
 - Even without the 2 above two changes I can get it to work for now.
   But pci_for_each_dma_alias does not work as expected on this platform
   and we have to be aware of that for all future uses of the function.
  
For now, I have ruled out the approach, and I have rebased the earlier
patch on to 4.11-rc and submitting again for review. The changes are:

v3->v4:
 - new address of author

v2>v3:
 - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
   PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
 - updated commit message to make the quirk clearer.

Let me know your comments and suggestions.

Thanks,
JC.


Jayachandran C (2):
  PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
  PCI: quirks: Fix ThunderX2 dma alias handling

 drivers/pci/quirks.c | 14 ++++++++++++++
 drivers/pci/search.c |  4 ++++
 include/linux/pci.h  |  2 ++
 3 files changed, 20 insertions(+)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-03 13:15 ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: Jayachandran C, linux-arm-kernel, Jon Masters, Robin Murphy

Hi Bjorn, Alex,

Sending this again (with a trivial fix to author name), please review.
Updated summary below:

Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
ThunderX2 systems (previously known as Broadcom Vulcan).

The earlier discussions on this can be seen at:
http://www.spinics.net/lists/linux-pci/msg51001.html
https://patchwork.ozlabs.org/patch/582633/ and
https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html

The earlier discussion on this patchset ended with a suggestion that it
may be possible to fix up this quirk by handling the issue in the
function argument of pci_for_each_dma_alias(). But at that point we did
not have the codebase to make the changes since the full ACPI and OF code
for SMMU and GIC ITS was not upstream.

Now that the changes are upstream, I tried to fix it in both the SMMU
and the GIC ITS code based on this suggestion, the changes needed are at:
 https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup

The problems with this approach are:
 - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
   tree, I have to fixup 6 callers (which is all but one ofthe callers
   outside x86)
 - 4 of these can be reasonably handled (please see the github repo above),
   but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
   drivers/iommu/iommu.c cannot be reasonably fixed up.
 - Even without the 2 above two changes I can get it to work for now.
   But pci_for_each_dma_alias does not work as expected on this platform
   and we have to be aware of that for all future uses of the function.
  
For now, I have ruled out the approach, and I have rebased the earlier
patch on to 4.11-rc and submitting again for review. The changes are:

v3->v4:
 - new address of author

v2>v3:
 - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
   PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
 - updated commit message to make the quirk clearer.

Let me know your comments and suggestions.

Thanks,
JC.


Jayachandran C (2):
  PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
  PCI: quirks: Fix ThunderX2 dma alias handling

 drivers/pci/quirks.c | 14 ++++++++++++++
 drivers/pci/search.c |  4 ++++
 include/linux/pci.h  |  2 ++
 3 files changed, 20 insertions(+)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-03 13:15 ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bjorn, Alex,

Sending this again (with a trivial fix to author name), please review.
Updated summary below:

Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
ThunderX2 systems (previously known as Broadcom Vulcan).

The earlier discussions on this can be seen at:
http://www.spinics.net/lists/linux-pci/msg51001.html
https://patchwork.ozlabs.org/patch/582633/ and
https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html

The earlier discussion on this patchset ended with a suggestion that it
may be possible to fix up this quirk by handling the issue in the
function argument of pci_for_each_dma_alias(). But at that point we did
not have the codebase to make the changes since the full ACPI and OF code
for SMMU and GIC ITS was not upstream.

Now that the changes are upstream, I tried to fix it in both the SMMU
and the GIC ITS code based on this suggestion, the changes needed are at:
 https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup

The problems with this approach are:
 - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
   tree, I have to fixup 6 callers (which is all but one ofthe callers
   outside x86)
 - 4 of these can be reasonably handled (please see the github repo above),
   but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
   drivers/iommu/iommu.c cannot be reasonably fixed up.
 - Even without the 2 above two changes I can get it to work for now.
   But pci_for_each_dma_alias does not work as expected on this platform
   and we have to be aware of that for all future uses of the function.
  
For now, I have ruled out the approach, and I have rebased the earlier
patch on to 4.11-rc and submitting again for review. The changes are:

v3->v4:
 - new address of author

v2>v3:
 - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
   PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
 - updated commit message to make the quirk clearer.

Let me know your comments and suggestions.

Thanks,
JC.


Jayachandran C (2):
  PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
  PCI: quirks: Fix ThunderX2 dma alias handling

 drivers/pci/quirks.c | 14 ++++++++++++++
 drivers/pci/search.c |  4 ++++
 include/linux/pci.h  |  2 ++
 3 files changed, 20 insertions(+)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 13:15   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: Jon Masters, Robin Murphy, Jayachandran C, linux-arm-kernel

Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
alias search to go no further than the bridge where the IOMMU unit is
attached.

The flag will be used to indicate a bridge device which forwards the
address translation requests to the IOMMU, i.e where the interrupt and
DMA requests leave the PCIe hierarchy and go into the system blocks.

Usually this happens at the PCI RC, so this flag is not needed. But
on systems where there are bridges that introduce aliases above the
"real" root bridge, this flag is needed to ensure that the function
pci_for_each_dma_alias() works correctly.

The function pci_for_each_dma_alias() is updated to stop when it see a
bridge with this flag set.

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
---
 drivers/pci/search.c | 4 ++++
 include/linux/pci.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 33e0f03..4c6044a 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
 
 		tmp = bus->self;
 
+		/* stop at bridge where translation unit is associated */
+		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
+			return ret;
+
 		/*
 		 * PCIe-to-PCI/X bridges alias transactions from downstream
 		 * devices using the subordinate bus number (PCI Express to
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..3f596ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* a non-root bridge where translation occurs, stop alias search here */
+	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 13:15   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA, Alex Williamson,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Jon Masters, Jayachandran C,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
alias search to go no further than the bridge where the IOMMU unit is
attached.

The flag will be used to indicate a bridge device which forwards the
address translation requests to the IOMMU, i.e where the interrupt and
DMA requests leave the PCIe hierarchy and go into the system blocks.

Usually this happens at the PCI RC, so this flag is not needed. But
on systems where there are bridges that introduce aliases above the
"real" root bridge, this flag is needed to ensure that the function
pci_for_each_dma_alias() works correctly.

The function pci_for_each_dma_alias() is updated to stop when it see a
bridge with this flag set.

Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 drivers/pci/search.c | 4 ++++
 include/linux/pci.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 33e0f03..4c6044a 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
 
 		tmp = bus->self;
 
+		/* stop at bridge where translation unit is associated */
+		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
+			return ret;
+
 		/*
 		 * PCIe-to-PCI/X bridges alias transactions from downstream
 		 * devices using the subordinate bus number (PCI Express to
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..3f596ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* a non-root bridge where translation occurs, stop alias search here */
+	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 13:15   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: linux-arm-kernel

Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
alias search to go no further than the bridge where the IOMMU unit is
attached.

The flag will be used to indicate a bridge device which forwards the
address translation requests to the IOMMU, i.e where the interrupt and
DMA requests leave the PCIe hierarchy and go into the system blocks.

Usually this happens at the PCI RC, so this flag is not needed. But
on systems where there are bridges that introduce aliases above the
"real" root bridge, this flag is needed to ensure that the function
pci_for_each_dma_alias() works correctly.

The function pci_for_each_dma_alias() is updated to stop when it see a
bridge with this flag set.

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
---
 drivers/pci/search.c | 4 ++++
 include/linux/pci.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 33e0f03..4c6044a 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
 
 		tmp = bus->self;
 
+		/* stop at bridge where translation unit is associated */
+		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
+			return ret;
+
 		/*
 		 * PCIe-to-PCI/X bridges alias transactions from downstream
 		 * devices using the subordinate bus number (PCI Express to
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..3f596ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* a non-root bridge where translation occurs, stop alias search here */
+	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-03 13:15 ` Jayachandran C
  (?)
@ 2017-04-03 13:15   ` Jayachandran C
  -1 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: Jon Masters, Robin Murphy, Jayachandran C, linux-arm-kernel

The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
topology is slightly unusual. For a multi-node system, it looks like:

[node level PCI bridges - one per node]
    [SoC PCI devices with MSI-X but no IOMMU]
    [PCI-PCIe "glue" bridges - upto 14, one per real port below]
        [PCIe real root ports associated with IOMMU and GICv3 ITS]
            [External PCI devices connected to PCIe links]

The top two levels of bridges should have introduced aliases since they
are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
In the case of external PCIe devices, the "real" root ports are connected
to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
alias. The SoC PCI devices are directly connected to the GIC ITS, so the
node level bridges do not introduce an alias either.

To handle this quirk, we mark the real PCIe root ports and node level
PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
pci_for_each_dma_alias() works correctly for external PCIe devices and
SoC PCI devices.

For the current revision of Cavium ThunderX2, the VendorID and Device ID
are from Broadcom Vulcan (14e4:90XX).

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
---
 drivers/pci/quirks.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..564a84a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
 
 /*
+ * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
+ * associated not at the root bus, but at a bridge below. This quirk flag
+ * will ensure that the aliases are identified correctly.
+ */
+static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
+{
+	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
+				quirk_bridge_cavm_thrx2_pcie_root);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
+				quirk_bridge_cavm_thrx2_pcie_root);
+
+/*
  * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
  * class code.  Fix it.
  */
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-03 13:15   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: Jayachandran C, linux-arm-kernel, Jon Masters, Robin Murphy

The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
topology is slightly unusual. For a multi-node system, it looks like:

[node level PCI bridges - one per node]
    [SoC PCI devices with MSI-X but no IOMMU]
    [PCI-PCIe "glue" bridges - upto 14, one per real port below]
        [PCIe real root ports associated with IOMMU and GICv3 ITS]
            [External PCI devices connected to PCIe links]

The top two levels of bridges should have introduced aliases since they
are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
In the case of external PCIe devices, the "real" root ports are connected
to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
alias. The SoC PCI devices are directly connected to the GIC ITS, so the
node level bridges do not introduce an alias either.

To handle this quirk, we mark the real PCIe root ports and node level
PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
pci_for_each_dma_alias() works correctly for external PCIe devices and
SoC PCI devices.

For the current revision of Cavium ThunderX2, the VendorID and Device ID
are from Broadcom Vulcan (14e4:90XX).

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
---
 drivers/pci/quirks.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..564a84a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
 
 /*
+ * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
+ * associated not at the root bus, but at a bridge below. This quirk flag
+ * will ensure that the aliases are identified correctly.
+ */
+static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
+{
+	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
+				quirk_bridge_cavm_thrx2_pcie_root);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
+				quirk_bridge_cavm_thrx2_pcie_root);
+
+/*
  * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
  * class code.  Fix it.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-03 13:15   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-03 13:15 UTC (permalink / raw)
  To: linux-arm-kernel

The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
topology is slightly unusual. For a multi-node system, it looks like:

[node level PCI bridges - one per node]
    [SoC PCI devices with MSI-X but no IOMMU]
    [PCI-PCIe "glue" bridges - upto 14, one per real port below]
        [PCIe real root ports associated with IOMMU and GICv3 ITS]
            [External PCI devices connected to PCIe links]

The top two levels of bridges should have introduced aliases since they
are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
In the case of external PCIe devices, the "real" root ports are connected
to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
alias. The SoC PCI devices are directly connected to the GIC ITS, so the
node level bridges do not introduce an alias either.

To handle this quirk, we mark the real PCIe root ports and node level
PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
pci_for_each_dma_alias() works correctly for external PCIe devices and
SoC PCI devices.

For the current revision of Cavium ThunderX2, the VendorID and Device ID
are from Broadcom Vulcan (14e4:90XX).

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
---
 drivers/pci/quirks.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..564a84a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
 
 /*
+ * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
+ * associated not at the root bus, but at a bridge below. This quirk flag
+ * will ensure that the aliases are identified correctly.
+ */
+static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
+{
+	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
+				quirk_bridge_cavm_thrx2_pcie_root);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
+				quirk_bridge_cavm_thrx2_pcie_root);
+
+/*
  * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
  * class code.  Fix it.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 14:59     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 14:59 UTC (permalink / raw)
  To: Jayachandran C, Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: linux-arm-kernel, Jon Masters

On 03/04/17 14:15, Jayachandran C wrote:
> Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
> alias search to go no further than the bridge where the IOMMU unit is
> attached.
> 
> The flag will be used to indicate a bridge device which forwards the
> address translation requests to the IOMMU, i.e where the interrupt and
> DMA requests leave the PCIe hierarchy and go into the system blocks.
> 
> Usually this happens at the PCI RC, so this flag is not needed. But
> on systems where there are bridges that introduce aliases above the
> "real" root bridge, this flag is needed to ensure that the function
> pci_for_each_dma_alias() works correctly.
> 
> The function pci_for_each_dma_alias() is updated to stop when it see a
> bridge with this flag set.

As it seems to have been me positing most of the alternative
suggestions, which have indeed turned out to have holes in (no, I can't
see how we'd cleanly fix pci_device_group() either), I'll stand by my
earlier "(I have no actual objection to this patch, though, [...])":

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/search.c | 4 ++++
>  include/linux/pci.h  | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 33e0f03..4c6044a 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>  
>  		tmp = bus->self;
>  
> +		/* stop at bridge where translation unit is associated */
> +		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
> +			return ret;
> +
>  		/*
>  		 * PCIe-to-PCI/X bridges alias transactions from downstream
>  		 * devices using the subordinate bus number (PCI Express to
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..3f596ac 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* a non-root bridge where translation occurs, stop alias search here */
> +	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>  };
>  
>  enum pci_irq_reroute_variant {
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 14:59     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 14:59 UTC (permalink / raw)
  To: Jayachandran C, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Alex Williamson,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Jon Masters, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 03/04/17 14:15, Jayachandran C wrote:
> Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
> alias search to go no further than the bridge where the IOMMU unit is
> attached.
> 
> The flag will be used to indicate a bridge device which forwards the
> address translation requests to the IOMMU, i.e where the interrupt and
> DMA requests leave the PCIe hierarchy and go into the system blocks.
> 
> Usually this happens at the PCI RC, so this flag is not needed. But
> on systems where there are bridges that introduce aliases above the
> "real" root bridge, this flag is needed to ensure that the function
> pci_for_each_dma_alias() works correctly.
> 
> The function pci_for_each_dma_alias() is updated to stop when it see a
> bridge with this flag set.

As it seems to have been me positing most of the alternative
suggestions, which have indeed turned out to have holes in (no, I can't
see how we'd cleanly fix pci_device_group() either), I'll stand by my
earlier "(I have no actual objection to this patch, though, [...])":

Reviewed-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

> Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  drivers/pci/search.c | 4 ++++
>  include/linux/pci.h  | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 33e0f03..4c6044a 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>  
>  		tmp = bus->self;
>  
> +		/* stop at bridge where translation unit is associated */
> +		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
> +			return ret;
> +
>  		/*
>  		 * PCIe-to-PCI/X bridges alias transactions from downstream
>  		 * devices using the subordinate bus number (PCI Express to
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..3f596ac 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* a non-root bridge where translation occurs, stop alias search here */
> +	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>  };
>  
>  enum pci_irq_reroute_variant {
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
@ 2017-04-03 14:59     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/04/17 14:15, Jayachandran C wrote:
> Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
> alias search to go no further than the bridge where the IOMMU unit is
> attached.
> 
> The flag will be used to indicate a bridge device which forwards the
> address translation requests to the IOMMU, i.e where the interrupt and
> DMA requests leave the PCIe hierarchy and go into the system blocks.
> 
> Usually this happens at the PCI RC, so this flag is not needed. But
> on systems where there are bridges that introduce aliases above the
> "real" root bridge, this flag is needed to ensure that the function
> pci_for_each_dma_alias() works correctly.
> 
> The function pci_for_each_dma_alias() is updated to stop when it see a
> bridge with this flag set.

As it seems to have been me positing most of the alternative
suggestions, which have indeed turned out to have holes in (no, I can't
see how we'd cleanly fix pci_device_group() either), I'll stand by my
earlier "(I have no actual objection to this patch, though, [...])":

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/search.c | 4 ++++
>  include/linux/pci.h  | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 33e0f03..4c6044a 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>  
>  		tmp = bus->self;
>  
> +		/* stop at bridge where translation unit is associated */
> +		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
> +			return ret;
> +
>  		/*
>  		 * PCIe-to-PCI/X bridges alias transactions from downstream
>  		 * devices using the subordinate bus number (PCI Express to
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..3f596ac 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* a non-root bridge where translation occurs, stop alias search here */
> +	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>  };
>  
>  enum pci_irq_reroute_variant {
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-03 15:07     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 15:07 UTC (permalink / raw)
  To: Jayachandran C, Bjorn Helgaas, linux-pci, Alex Williamson, iommu
  Cc: linux-arm-kernel, Jon Masters

On 03/04/17 14:15, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
>     [SoC PCI devices with MSI-X but no IOMMU]
>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>             [External PCI devices connected to PCIe links]

Since it's not entirely obvious, what does the actual DT - or IORT if
you must ;) - topology for this look like? I can't help thinking that
either it's inaccurate, or that this is going to expose a shortcoming in
pci_dma_configure() which breaks things - unless I'm missing something,
isn't find_pci_root_bus() going to go all the way up to the top-level
glue bridge and pick up the wrong firmware node (if any) for the
appropriate DMA properties?

Robin.

> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).
> 
> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>   * class code.  Fix it.
>   */
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-03 15:07     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 15:07 UTC (permalink / raw)
  To: Jayachandran C, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Alex Williamson,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Jon Masters, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 03/04/17 14:15, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
>     [SoC PCI devices with MSI-X but no IOMMU]
>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>             [External PCI devices connected to PCIe links]

Since it's not entirely obvious, what does the actual DT - or IORT if
you must ;) - topology for this look like? I can't help thinking that
either it's inaccurate, or that this is going to expose a shortcoming in
pci_dma_configure() which breaks things - unless I'm missing something,
isn't find_pci_root_bus() going to go all the way up to the top-level
glue bridge and pick up the wrong firmware node (if any) for the
appropriate DMA properties?

Robin.

> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).
> 
> Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>   * class code.  Fix it.
>   */
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-03 15:07     ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-03 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/04/17 14:15, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
>     [SoC PCI devices with MSI-X but no IOMMU]
>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>             [External PCI devices connected to PCIe links]

Since it's not entirely obvious, what does the actual DT - or IORT if
you must ;) - topology for this look like? I can't help thinking that
either it's inaccurate, or that this is going to expose a shortcoming in
pci_dma_configure() which breaks things - unless I'm missing something,
isn't find_pci_root_bus() going to go all the way up to the top-level
glue bridge and pick up the wrong firmware node (if any) for the
appropriate DMA properties?

Robin.

> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).
> 
> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>   * class code.  Fix it.
>   */
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-04 11:50       ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-04 11:50 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-pci, iommu, Alex Williamson, Bjorn Helgaas, Jon Masters,
	linux-arm-kernel

On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> On 03/04/17 14:15, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> 
> Since it's not entirely obvious, what does the actual DT - or IORT if
> you must ;) - topology for this look like? I can't help thinking that
> either it's inaccurate, or that this is going to expose a shortcoming in
> pci_dma_configure() which breaks things - unless I'm missing something,
> isn't find_pci_root_bus() going to go all the way up to the top-level
> glue bridge and pick up the wrong firmware node (if any) for the
> appropriate DMA properties?

I will try to describe the ACPI interface:

There is just one ECAM area, a single bus range and one set of memory
windows for the whole system - so there is just one entry in DSDT for
the PCI controller. This entry also corresponds to the PCI RC node in
IORT. DMA is coherent and supports 64 bits system-wide, the attributes
(in DSDT and IORT) reflect this.

lspci on the system looks like this:
-[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
           |               +-04.1  14e4:9026
           |               +-05.0  14e4:9027
           |               +-05.1  14e4:9027
           |               +-0a.0-[02-03]----00.0-[03]--
           |               +-0a.1-[04-05]----00.0-[05]--
           |           [...etc...]
           |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
           |               |                               \-00.1  8086:1583
           |           [...etc...]
           |               \-0b.5-[1d-1e]----00.0-[1e]--
           \-00.1-[1f-3b]--+-04.0  14e4:9026
                           +-04.1  14e4:9026
                           +-05.0  14e4:9027
                           +-05.1  14e4:9027
                           +-0a.0-[20-21]----00.0-[21]--
                       [...etc...]

The devices here are:
 - 00:00.0 and 00:00.1 are the node (socket) level bridges
 - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
 - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
 - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.

The IORT is built by the firmware based on its PCI enumeration. The IORT
will have multiple entries under the PCI RC node:
 - one entry per node to map the SoC devices directly to ITS for MSI-X,
   since the SoC devices are not attached to any SMMU.
 - An entry per "real" PCIe port to map RIDs under it to the corresponding
   SMMU.
The SMMU nodes will have an entry to map its RID ranges to the node ITS.

The IORT spec supports this configuration, and the corresponding code is
already upstream, so the only sticking point right now is
pci_for_each_dma_alias().

JC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-04 11:50       ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-04 11:50 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Bjorn Helgaas,
	Jon Masters, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> On 03/04/17 14:15, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> 
> Since it's not entirely obvious, what does the actual DT - or IORT if
> you must ;) - topology for this look like? I can't help thinking that
> either it's inaccurate, or that this is going to expose a shortcoming in
> pci_dma_configure() which breaks things - unless I'm missing something,
> isn't find_pci_root_bus() going to go all the way up to the top-level
> glue bridge and pick up the wrong firmware node (if any) for the
> appropriate DMA properties?

I will try to describe the ACPI interface:

There is just one ECAM area, a single bus range and one set of memory
windows for the whole system - so there is just one entry in DSDT for
the PCI controller. This entry also corresponds to the PCI RC node in
IORT. DMA is coherent and supports 64 bits system-wide, the attributes
(in DSDT and IORT) reflect this.

lspci on the system looks like this:
-[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
           |               +-04.1  14e4:9026
           |               +-05.0  14e4:9027
           |               +-05.1  14e4:9027
           |               +-0a.0-[02-03]----00.0-[03]--
           |               +-0a.1-[04-05]----00.0-[05]--
           |           [...etc...]
           |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
           |               |                               \-00.1  8086:1583
           |           [...etc...]
           |               \-0b.5-[1d-1e]----00.0-[1e]--
           \-00.1-[1f-3b]--+-04.0  14e4:9026
                           +-04.1  14e4:9026
                           +-05.0  14e4:9027
                           +-05.1  14e4:9027
                           +-0a.0-[20-21]----00.0-[21]--
                       [...etc...]

The devices here are:
 - 00:00.0 and 00:00.1 are the node (socket) level bridges
 - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
 - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
 - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.

The IORT is built by the firmware based on its PCI enumeration. The IORT
will have multiple entries under the PCI RC node:
 - one entry per node to map the SoC devices directly to ITS for MSI-X,
   since the SoC devices are not attached to any SMMU.
 - An entry per "real" PCIe port to map RIDs under it to the corresponding
   SMMU.
The SMMU nodes will have an entry to map its RID ranges to the node ITS.

The IORT spec supports this configuration, and the corresponding code is
already upstream, so the only sticking point right now is
pci_for_each_dma_alias().

JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-04 11:50       ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-04 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> On 03/04/17 14:15, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> 
> Since it's not entirely obvious, what does the actual DT - or IORT if
> you must ;) - topology for this look like? I can't help thinking that
> either it's inaccurate, or that this is going to expose a shortcoming in
> pci_dma_configure() which breaks things - unless I'm missing something,
> isn't find_pci_root_bus() going to go all the way up to the top-level
> glue bridge and pick up the wrong firmware node (if any) for the
> appropriate DMA properties?

I will try to describe the ACPI interface:

There is just one ECAM area, a single bus range and one set of memory
windows for the whole system - so there is just one entry in DSDT for
the PCI controller. This entry also corresponds to the PCI RC node in
IORT. DMA is coherent and supports 64 bits system-wide, the attributes
(in DSDT and IORT) reflect this.

lspci on the system looks like this:
-[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
           |               +-04.1  14e4:9026
           |               +-05.0  14e4:9027
           |               +-05.1  14e4:9027
           |               +-0a.0-[02-03]----00.0-[03]--
           |               +-0a.1-[04-05]----00.0-[05]--
           |           [...etc...]
           |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
           |               |                               \-00.1  8086:1583
           |           [...etc...]
           |               \-0b.5-[1d-1e]----00.0-[1e]--
           \-00.1-[1f-3b]--+-04.0  14e4:9026
                           +-04.1  14e4:9026
                           +-05.0  14e4:9027
                           +-05.1  14e4:9027
                           +-0a.0-[20-21]----00.0-[21]--
                       [...etc...]

The devices here are:
 - 00:00.0 and 00:00.1 are the node (socket) level bridges
 - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
 - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
 - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.

The IORT is built by the firmware based on its PCI enumeration. The IORT
will have multiple entries under the PCI RC node:
 - one entry per node to map the SoC devices directly to ITS for MSI-X,
   since the SoC devices are not attached to any SMMU.
 - An entry per "real" PCIe port to map RIDs under it to the corresponding
   SMMU.
The SMMU nodes will have an entry to map its RID ranges to the node ITS.

The IORT spec supports this configuration, and the corresponding code is
already upstream, so the only sticking point right now is
pci_for_each_dma_alias().

JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-04 11:50       ` Jayachandran C
  (?)
@ 2017-04-04 14:28         ` Robin Murphy
  -1 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-04 14:28 UTC (permalink / raw)
  To: Jayachandran C
  Cc: Bjorn Helgaas, linux-pci, Alex Williamson, iommu,
	linux-arm-kernel, Jon Masters

On 04/04/17 12:50, Jayachandran C wrote:
> On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
>> On 03/04/17 14:15, Jayachandran C wrote:
>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>
>>> [node level PCI bridges - one per node]
>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>             [External PCI devices connected to PCIe links]
>>
>> Since it's not entirely obvious, what does the actual DT - or IORT if
>> you must ;) - topology for this look like? I can't help thinking that
>> either it's inaccurate, or that this is going to expose a shortcoming in
>> pci_dma_configure() which breaks things - unless I'm missing something,
>> isn't find_pci_root_bus() going to go all the way up to the top-level
>> glue bridge and pick up the wrong firmware node (if any) for the
>> appropriate DMA properties?
> 
> I will try to describe the ACPI interface:
> 
> There is just one ECAM area, a single bus range and one set of memory
> windows for the whole system - so there is just one entry in DSDT for
> the PCI controller. This entry also corresponds to the PCI RC node in
> IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> (in DSDT and IORT) reflect this.
> 
> lspci on the system looks like this:
> -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
>            |               +-04.1  14e4:9026
>            |               +-05.0  14e4:9027
>            |               +-05.1  14e4:9027
>            |               +-0a.0-[02-03]----00.0-[03]--
>            |               +-0a.1-[04-05]----00.0-[05]--
>            |           [...etc...]
>            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
>            |               |                               \-00.1  8086:1583
>            |           [...etc...]
>            |               \-0b.5-[1d-1e]----00.0-[1e]--
>            \-00.1-[1f-3b]--+-04.0  14e4:9026
>                            +-04.1  14e4:9026
>                            +-05.0  14e4:9027
>                            +-05.1  14e4:9027
>                            +-0a.0-[20-21]----00.0-[21]--
>                        [...etc...]
> 
> The devices here are:
>  - 00:00.0 and 00:00.1 are the node (socket) level bridges
>  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
>  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
>  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> 
> The IORT is built by the firmware based on its PCI enumeration. The IORT
> will have multiple entries under the PCI RC node:
>  - one entry per node to map the SoC devices directly to ITS for MSI-X,
>    since the SoC devices are not attached to any SMMU.
>  - An entry per "real" PCIe port to map RIDs under it to the corresponding
>    SMMU.
> The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> 
> The IORT spec supports this configuration, and the corresponding code is
> already upstream, so the only sticking point right now is
> pci_for_each_dma_alias().

Thanks, that helps a lot. The "single global ECAM space" idea was
eluding me, but in that context it all makes much more sense - I'm
assuming the two quirked device IDs correspond to the 00:00.[01] devices
and the [02-1e]:00.0 ones.

So (at the risk of Jon mooing at me), I guess the DT description would
be a single node looking something like:

pcie {
	reg = [global ECAM space for segment 0000];

	msi-map = <0x0100 &its0 0x0100 0x1d00>,
		  <0x1f00 &its1 0x1f00 0x1d00>;
	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
		    <0x2000 &smmu0 0x2000 0x1c00>;
};

(note to self: which incidentally also means of_pci_map_rid() probably
wants fixing to not treat gaps in the map as an error)

With only one node like that, rather than having the whole first 3
levels of bridges described, the "stop at the appropriate node in the
callback" approach does become even more impractical in all cases. So,
for $TITLE, based on the above understanding:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Cheers,
Robin.

> 
> JC.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-04 14:28         ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-04 14:28 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Bjorn Helgaas,
	Jon Masters, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 04/04/17 12:50, Jayachandran C wrote:
> On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
>> On 03/04/17 14:15, Jayachandran C wrote:
>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>
>>> [node level PCI bridges - one per node]
>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>             [External PCI devices connected to PCIe links]
>>
>> Since it's not entirely obvious, what does the actual DT - or IORT if
>> you must ;) - topology for this look like? I can't help thinking that
>> either it's inaccurate, or that this is going to expose a shortcoming in
>> pci_dma_configure() which breaks things - unless I'm missing something,
>> isn't find_pci_root_bus() going to go all the way up to the top-level
>> glue bridge and pick up the wrong firmware node (if any) for the
>> appropriate DMA properties?
> 
> I will try to describe the ACPI interface:
> 
> There is just one ECAM area, a single bus range and one set of memory
> windows for the whole system - so there is just one entry in DSDT for
> the PCI controller. This entry also corresponds to the PCI RC node in
> IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> (in DSDT and IORT) reflect this.
> 
> lspci on the system looks like this:
> -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
>            |               +-04.1  14e4:9026
>            |               +-05.0  14e4:9027
>            |               +-05.1  14e4:9027
>            |               +-0a.0-[02-03]----00.0-[03]--
>            |               +-0a.1-[04-05]----00.0-[05]--
>            |           [...etc...]
>            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
>            |               |                               \-00.1  8086:1583
>            |           [...etc...]
>            |               \-0b.5-[1d-1e]----00.0-[1e]--
>            \-00.1-[1f-3b]--+-04.0  14e4:9026
>                            +-04.1  14e4:9026
>                            +-05.0  14e4:9027
>                            +-05.1  14e4:9027
>                            +-0a.0-[20-21]----00.0-[21]--
>                        [...etc...]
> 
> The devices here are:
>  - 00:00.0 and 00:00.1 are the node (socket) level bridges
>  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
>  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
>  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> 
> The IORT is built by the firmware based on its PCI enumeration. The IORT
> will have multiple entries under the PCI RC node:
>  - one entry per node to map the SoC devices directly to ITS for MSI-X,
>    since the SoC devices are not attached to any SMMU.
>  - An entry per "real" PCIe port to map RIDs under it to the corresponding
>    SMMU.
> The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> 
> The IORT spec supports this configuration, and the corresponding code is
> already upstream, so the only sticking point right now is
> pci_for_each_dma_alias().

Thanks, that helps a lot. The "single global ECAM space" idea was
eluding me, but in that context it all makes much more sense - I'm
assuming the two quirked device IDs correspond to the 00:00.[01] devices
and the [02-1e]:00.0 ones.

So (at the risk of Jon mooing at me), I guess the DT description would
be a single node looking something like:

pcie {
	reg = [global ECAM space for segment 0000];

	msi-map = <0x0100 &its0 0x0100 0x1d00>,
		  <0x1f00 &its1 0x1f00 0x1d00>;
	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
		    <0x2000 &smmu0 0x2000 0x1c00>;
};

(note to self: which incidentally also means of_pci_map_rid() probably
wants fixing to not treat gaps in the map as an error)

With only one node like that, rather than having the whole first 3
levels of bridges described, the "stop at the appropriate node in the
callback" approach does become even more impractical in all cases. So,
for $TITLE, based on the above understanding:

Reviewed-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Cheers,
Robin.

> 
> JC.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-04 14:28         ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-04 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/04/17 12:50, Jayachandran C wrote:
> On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
>> On 03/04/17 14:15, Jayachandran C wrote:
>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>
>>> [node level PCI bridges - one per node]
>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>             [External PCI devices connected to PCIe links]
>>
>> Since it's not entirely obvious, what does the actual DT - or IORT if
>> you must ;) - topology for this look like? I can't help thinking that
>> either it's inaccurate, or that this is going to expose a shortcoming in
>> pci_dma_configure() which breaks things - unless I'm missing something,
>> isn't find_pci_root_bus() going to go all the way up to the top-level
>> glue bridge and pick up the wrong firmware node (if any) for the
>> appropriate DMA properties?
> 
> I will try to describe the ACPI interface:
> 
> There is just one ECAM area, a single bus range and one set of memory
> windows for the whole system - so there is just one entry in DSDT for
> the PCI controller. This entry also corresponds to the PCI RC node in
> IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> (in DSDT and IORT) reflect this.
> 
> lspci on the system looks like this:
> -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
>            |               +-04.1  14e4:9026
>            |               +-05.0  14e4:9027
>            |               +-05.1  14e4:9027
>            |               +-0a.0-[02-03]----00.0-[03]--
>            |               +-0a.1-[04-05]----00.0-[05]--
>            |           [...etc...]
>            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
>            |               |                               \-00.1  8086:1583
>            |           [...etc...]
>            |               \-0b.5-[1d-1e]----00.0-[1e]--
>            \-00.1-[1f-3b]--+-04.0  14e4:9026
>                            +-04.1  14e4:9026
>                            +-05.0  14e4:9027
>                            +-05.1  14e4:9027
>                            +-0a.0-[20-21]----00.0-[21]--
>                        [...etc...]
> 
> The devices here are:
>  - 00:00.0 and 00:00.1 are the node (socket) level bridges
>  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
>  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
>  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> 
> The IORT is built by the firmware based on its PCI enumeration. The IORT
> will have multiple entries under the PCI RC node:
>  - one entry per node to map the SoC devices directly to ITS for MSI-X,
>    since the SoC devices are not attached to any SMMU.
>  - An entry per "real" PCIe port to map RIDs under it to the corresponding
>    SMMU.
> The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> 
> The IORT spec supports this configuration, and the corresponding code is
> already upstream, so the only sticking point right now is
> pci_for_each_dma_alias().

Thanks, that helps a lot. The "single global ECAM space" idea was
eluding me, but in that context it all makes much more sense - I'm
assuming the two quirked device IDs correspond to the 00:00.[01] devices
and the [02-1e]:00.0 ones.

So (at the risk of Jon mooing at me), I guess the DT description would
be a single node looking something like:

pcie {
	reg = [global ECAM space for segment 0000];

	msi-map = <0x0100 &its0 0x0100 0x1d00>,
		  <0x1f00 &its1 0x1f00 0x1d00>;
	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
		    <0x2000 &smmu0 0x2000 0x1c00>;
};

(note to self: which incidentally also means of_pci_map_rid() probably
wants fixing to not treat gaps in the map as an error)

With only one node like that, rather than having the whole first 3
levels of bridges described, the "stop at the appropriate node in the
callback" approach does become even more impractical in all cases. So,
for $TITLE, based on the above understanding:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Cheers,
Robin.

> 
> JC.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-10 11:38           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-10 11:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robin Murphy, linux-pci, Alex Williamson, iommu,
	linux-arm-kernel, Jon Masters

[Moving Bjorn back to to: ]

On Tue, Apr 04, 2017 at 03:28:26PM +0100, Robin Murphy wrote:
> On 04/04/17 12:50, Jayachandran C wrote:
> > On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> >> On 03/04/17 14:15, Jayachandran C wrote:
> >>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> >>> topology is slightly unusual. For a multi-node system, it looks like:
> >>>
> >>> [node level PCI bridges - one per node]
> >>>     [SoC PCI devices with MSI-X but no IOMMU]
> >>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >>>             [External PCI devices connected to PCIe links]
> >>
> >> Since it's not entirely obvious, what does the actual DT - or IORT if
> >> you must ;) - topology for this look like? I can't help thinking that
> >> either it's inaccurate, or that this is going to expose a shortcoming in
> >> pci_dma_configure() which breaks things - unless I'm missing something,
> >> isn't find_pci_root_bus() going to go all the way up to the top-level
> >> glue bridge and pick up the wrong firmware node (if any) for the
> >> appropriate DMA properties?
> > 
> > I will try to describe the ACPI interface:
> > 
> > There is just one ECAM area, a single bus range and one set of memory
> > windows for the whole system - so there is just one entry in DSDT for
> > the PCI controller. This entry also corresponds to the PCI RC node in
> > IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> > (in DSDT and IORT) reflect this.
> > 
> > lspci on the system looks like this:
> > -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
> >            |               +-04.1  14e4:9026
> >            |               +-05.0  14e4:9027
> >            |               +-05.1  14e4:9027
> >            |               +-0a.0-[02-03]----00.0-[03]--
> >            |               +-0a.1-[04-05]----00.0-[05]--
> >            |           [...etc...]
> >            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
> >            |               |                               \-00.1  8086:1583
> >            |           [...etc...]
> >            |               \-0b.5-[1d-1e]----00.0-[1e]--
> >            \-00.1-[1f-3b]--+-04.0  14e4:9026
> >                            +-04.1  14e4:9026
> >                            +-05.0  14e4:9027
> >                            +-05.1  14e4:9027
> >                            +-0a.0-[20-21]----00.0-[21]--
> >                        [...etc...]
> > 
> > The devices here are:
> >  - 00:00.0 and 00:00.1 are the node (socket) level bridges
> >  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
> >  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
> >  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> > Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> > 
> > The IORT is built by the firmware based on its PCI enumeration. The IORT
> > will have multiple entries under the PCI RC node:
> >  - one entry per node to map the SoC devices directly to ITS for MSI-X,
> >    since the SoC devices are not attached to any SMMU.
> >  - An entry per "real" PCIe port to map RIDs under it to the corresponding
> >    SMMU.
> > The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> > 
> > The IORT spec supports this configuration, and the corresponding code is
> > already upstream, so the only sticking point right now is
> > pci_for_each_dma_alias().
> 
> Thanks, that helps a lot. The "single global ECAM space" idea was
> eluding me, but in that context it all makes much more sense - I'm
> assuming the two quirked device IDs correspond to the 00:00.[01] devices
> and the [02-1e]:00.0 ones.
> 
> So (at the risk of Jon mooing at me), I guess the DT description would
> be a single node looking something like:
> 
> pcie {
> 	reg = [global ECAM space for segment 0000];
> 
> 	msi-map = <0x0100 &its0 0x0100 0x1d00>,
> 		  <0x1f00 &its1 0x1f00 0x1d00>;
> 	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
> 		    <0x2000 &smmu0 0x2000 0x1c00>;
> };
> 
> (note to self: which incidentally also means of_pci_map_rid() probably
> wants fixing to not treat gaps in the map as an error)
> 
> With only one node like that, rather than having the whole first 3
> levels of bridges described, the "stop at the appropriate node in the
> callback" approach does become even more impractical in all cases. So,
> for $TITLE, based on the above understanding:
> 
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Hi Bjorn,

This seems to be the reasonable way to add support for the quirk. 
Would really appreciate feedback from you.

Thanks,
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-10 11:38           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-10 11:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[Moving Bjorn back to to: ]

On Tue, Apr 04, 2017 at 03:28:26PM +0100, Robin Murphy wrote:
> On 04/04/17 12:50, Jayachandran C wrote:
> > On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> >> On 03/04/17 14:15, Jayachandran C wrote:
> >>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> >>> topology is slightly unusual. For a multi-node system, it looks like:
> >>>
> >>> [node level PCI bridges - one per node]
> >>>     [SoC PCI devices with MSI-X but no IOMMU]
> >>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >>>             [External PCI devices connected to PCIe links]
> >>
> >> Since it's not entirely obvious, what does the actual DT - or IORT if
> >> you must ;) - topology for this look like? I can't help thinking that
> >> either it's inaccurate, or that this is going to expose a shortcoming in
> >> pci_dma_configure() which breaks things - unless I'm missing something,
> >> isn't find_pci_root_bus() going to go all the way up to the top-level
> >> glue bridge and pick up the wrong firmware node (if any) for the
> >> appropriate DMA properties?
> > 
> > I will try to describe the ACPI interface:
> > 
> > There is just one ECAM area, a single bus range and one set of memory
> > windows for the whole system - so there is just one entry in DSDT for
> > the PCI controller. This entry also corresponds to the PCI RC node in
> > IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> > (in DSDT and IORT) reflect this.
> > 
> > lspci on the system looks like this:
> > -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
> >            |               +-04.1  14e4:9026
> >            |               +-05.0  14e4:9027
> >            |               +-05.1  14e4:9027
> >            |               +-0a.0-[02-03]----00.0-[03]--
> >            |               +-0a.1-[04-05]----00.0-[05]--
> >            |           [...etc...]
> >            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
> >            |               |                               \-00.1  8086:1583
> >            |           [...etc...]
> >            |               \-0b.5-[1d-1e]----00.0-[1e]--
> >            \-00.1-[1f-3b]--+-04.0  14e4:9026
> >                            +-04.1  14e4:9026
> >                            +-05.0  14e4:9027
> >                            +-05.1  14e4:9027
> >                            +-0a.0-[20-21]----00.0-[21]--
> >                        [...etc...]
> > 
> > The devices here are:
> >  - 00:00.0 and 00:00.1 are the node (socket) level bridges
> >  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
> >  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
> >  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> > Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> > 
> > The IORT is built by the firmware based on its PCI enumeration. The IORT
> > will have multiple entries under the PCI RC node:
> >  - one entry per node to map the SoC devices directly to ITS for MSI-X,
> >    since the SoC devices are not attached to any SMMU.
> >  - An entry per "real" PCIe port to map RIDs under it to the corresponding
> >    SMMU.
> > The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> > 
> > The IORT spec supports this configuration, and the corresponding code is
> > already upstream, so the only sticking point right now is
> > pci_for_each_dma_alias().
> 
> Thanks, that helps a lot. The "single global ECAM space" idea was
> eluding me, but in that context it all makes much more sense - I'm
> assuming the two quirked device IDs correspond to the 00:00.[01] devices
> and the [02-1e]:00.0 ones.
> 
> So (at the risk of Jon mooing at me), I guess the DT description would
> be a single node looking something like:
> 
> pcie {
> 	reg = [global ECAM space for segment 0000];
> 
> 	msi-map = <0x0100 &its0 0x0100 0x1d00>,
> 		  <0x1f00 &its1 0x1f00 0x1d00>;
> 	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
> 		    <0x2000 &smmu0 0x2000 0x1c00>;
> };
> 
> (note to self: which incidentally also means of_pci_map_rid() probably
> wants fixing to not treat gaps in the map as an error)
> 
> With only one node like that, rather than having the whole first 3
> levels of bridges described, the "stop at the appropriate node in the
> callback" approach does become even more impractical in all cases. So,
> for $TITLE, based on the above understanding:
> 
> Reviewed-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Hi Bjorn,

This seems to be the reasonable way to add support for the quirk. 
Would really appreciate feedback from you.

Thanks,
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-10 11:38           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-10 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

[Moving Bjorn back to to: ]

On Tue, Apr 04, 2017 at 03:28:26PM +0100, Robin Murphy wrote:
> On 04/04/17 12:50, Jayachandran C wrote:
> > On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
> >> On 03/04/17 14:15, Jayachandran C wrote:
> >>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> >>> topology is slightly unusual. For a multi-node system, it looks like:
> >>>
> >>> [node level PCI bridges - one per node]
> >>>     [SoC PCI devices with MSI-X but no IOMMU]
> >>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >>>             [External PCI devices connected to PCIe links]
> >>
> >> Since it's not entirely obvious, what does the actual DT - or IORT if
> >> you must ;) - topology for this look like? I can't help thinking that
> >> either it's inaccurate, or that this is going to expose a shortcoming in
> >> pci_dma_configure() which breaks things - unless I'm missing something,
> >> isn't find_pci_root_bus() going to go all the way up to the top-level
> >> glue bridge and pick up the wrong firmware node (if any) for the
> >> appropriate DMA properties?
> > 
> > I will try to describe the ACPI interface:
> > 
> > There is just one ECAM area, a single bus range and one set of memory
> > windows for the whole system - so there is just one entry in DSDT for
> > the PCI controller. This entry also corresponds to the PCI RC node in
> > IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> > (in DSDT and IORT) reflect this.
> > 
> > lspci on the system looks like this:
> > -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
> >            |               +-04.1  14e4:9026
> >            |               +-05.0  14e4:9027
> >            |               +-05.1  14e4:9027
> >            |               +-0a.0-[02-03]----00.0-[03]--
> >            |               +-0a.1-[04-05]----00.0-[05]--
> >            |           [...etc...]
> >            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
> >            |               |                               \-00.1  8086:1583
> >            |           [...etc...]
> >            |               \-0b.5-[1d-1e]----00.0-[1e]--
> >            \-00.1-[1f-3b]--+-04.0  14e4:9026
> >                            +-04.1  14e4:9026
> >                            +-05.0  14e4:9027
> >                            +-05.1  14e4:9027
> >                            +-0a.0-[20-21]----00.0-[21]--
> >                        [...etc...]
> > 
> > The devices here are:
> >  - 00:00.0 and 00:00.1 are the node (socket) level bridges
> >  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
> >  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
> >  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> > Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> > 
> > The IORT is built by the firmware based on its PCI enumeration. The IORT
> > will have multiple entries under the PCI RC node:
> >  - one entry per node to map the SoC devices directly to ITS for MSI-X,
> >    since the SoC devices are not attached to any SMMU.
> >  - An entry per "real" PCIe port to map RIDs under it to the corresponding
> >    SMMU.
> > The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> > 
> > The IORT spec supports this configuration, and the corresponding code is
> > already upstream, so the only sticking point right now is
> > pci_for_each_dma_alias().
> 
> Thanks, that helps a lot. The "single global ECAM space" idea was
> eluding me, but in that context it all makes much more sense - I'm
> assuming the two quirked device IDs correspond to the 00:00.[01] devices
> and the [02-1e]:00.0 ones.
> 
> So (at the risk of Jon mooing at me), I guess the DT description would
> be a single node looking something like:
> 
> pcie {
> 	reg = [global ECAM space for segment 0000];
> 
> 	msi-map = <0x0100 &its0 0x0100 0x1d00>,
> 		  <0x1f00 &its1 0x1f00 0x1d00>;
> 	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
> 		    <0x2000 &smmu0 0x2000 0x1c00>;
> };
> 
> (note to self: which incidentally also means of_pci_map_rid() probably
> wants fixing to not treat gaps in the map as an error)
> 
> With only one node like that, rather than having the whole first 3
> levels of bridges described, the "stop at the appropriate node in the
> callback" approach does become even more impractical in all cases. So,
> for $TITLE, based on the above understanding:
> 
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Hi Bjorn,

This seems to be the reasonable way to add support for the quirk. 
Would really appreciate feedback from you.

Thanks,
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-03 13:15   ` Jayachandran C
@ 2017-04-11  1:28     ` Bjorn Helgaas
  -1 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11  1:28 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Alex Williamson, iommu, Jon Masters, Robin Murphy,
	linux-arm-kernel

Hi Jayachandran,

On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
>     [SoC PCI devices with MSI-X but no IOMMU]
>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>             [External PCI devices connected to PCIe links]
> 
> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).

Can you supply some text here about why we want to apply this patch?
E.g., does it avoid making unnecessary IOMMU mappings, improve
performance, avoid a crash, etc?

> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>   * class code.  Fix it.
>   */
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11  1:28     ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11  1:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jayachandran,

On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
>     [SoC PCI devices with MSI-X but no IOMMU]
>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>             [External PCI devices connected to PCIe links]
> 
> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).

Can you supply some text here about why we want to apply this patch?
E.g., does it avoid making unnecessary IOMMU mappings, improve
performance, avoid a crash, etc?

> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> ---
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> +				quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>   * class code.  Fix it.
>   */
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-11  1:28     ` Bjorn Helgaas
  (?)
@ 2017-04-11  7:10       ` Jayachandran C
  -1 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11  7:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, iommu, Alex Williamson, Jon Masters, Robin Murphy,
	linux-arm-kernel

On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> Hi Jayachandran,
> 
> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> > 
> > The top two levels of bridges should have introduced aliases since they
> > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > In the case of external PCIe devices, the "real" root ports are connected
> > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > node level bridges do not introduce an alias either.
> > 
> > To handle this quirk, we mark the real PCIe root ports and node level
> > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > SoC PCI devices.
> > 
> > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > are from Broadcom Vulcan (14e4:90XX).
> 
> Can you supply some text here about why we want to apply this patch?
> E.g., does it avoid making unnecessary IOMMU mappings, improve
> performance, avoid a crash, etc?

If this is for the commit message, I hope the following is ok:

"With this change, both MSI-X and IO virtualization work correctly on
Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
devices, and the IOMMU groups are setup correctly."

I can send out a new patch if needed.

The on chip SATA and USB use MSI-X, so this is needed for basic
functionality of the platform.

> 
> > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > ---
> >  drivers/pci/quirks.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6736836..564a84a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> >  
> >  /*
> > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > + * associated not at the root bus, but at a bridge below. This quirk flag
> > + * will ensure that the aliases are identified correctly.
> > + */
> > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > +{
> > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +
> > +/*
> >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> >   * class code.  Fix it.
> >   */

Thanks,
JC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11  7:10       ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11  7:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, iommu, Jon Masters, Robin Murphy,
	linux-arm-kernel

On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> Hi Jayachandran,
> 
> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> > 
> > The top two levels of bridges should have introduced aliases since they
> > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > In the case of external PCIe devices, the "real" root ports are connected
> > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > node level bridges do not introduce an alias either.
> > 
> > To handle this quirk, we mark the real PCIe root ports and node level
> > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > SoC PCI devices.
> > 
> > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > are from Broadcom Vulcan (14e4:90XX).
> 
> Can you supply some text here about why we want to apply this patch?
> E.g., does it avoid making unnecessary IOMMU mappings, improve
> performance, avoid a crash, etc?

If this is for the commit message, I hope the following is ok:

"With this change, both MSI-X and IO virtualization work correctly on
Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
devices, and the IOMMU groups are setup correctly."

I can send out a new patch if needed.

The on chip SATA and USB use MSI-X, so this is needed for basic
functionality of the platform.

> 
> > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > ---
> >  drivers/pci/quirks.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6736836..564a84a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> >  
> >  /*
> > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > + * associated not at the root bus, but at a bridge below. This quirk flag
> > + * will ensure that the aliases are identified correctly.
> > + */
> > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > +{
> > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +
> > +/*
> >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> >   * class code.  Fix it.
> >   */

Thanks,
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11  7:10       ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11  7:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> Hi Jayachandran,
> 
> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > topology is slightly unusual. For a multi-node system, it looks like:
> > 
> > [node level PCI bridges - one per node]
> >     [SoC PCI devices with MSI-X but no IOMMU]
> >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> >             [External PCI devices connected to PCIe links]
> > 
> > The top two levels of bridges should have introduced aliases since they
> > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > In the case of external PCIe devices, the "real" root ports are connected
> > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > node level bridges do not introduce an alias either.
> > 
> > To handle this quirk, we mark the real PCIe root ports and node level
> > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > SoC PCI devices.
> > 
> > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > are from Broadcom Vulcan (14e4:90XX).
> 
> Can you supply some text here about why we want to apply this patch?
> E.g., does it avoid making unnecessary IOMMU mappings, improve
> performance, avoid a crash, etc?

If this is for the commit message, I hope the following is ok:

"With this change, both MSI-X and IO virtualization work correctly on
Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
devices, and the IOMMU groups are setup correctly."

I can send out a new patch if needed.

The on chip SATA and USB use MSI-X, so this is needed for basic
functionality of the platform.

> 
> > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > ---
> >  drivers/pci/quirks.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6736836..564a84a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> >  
> >  /*
> > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > + * associated not at the root bus, but at a bridge below. This quirk flag
> > + * will ensure that the aliases are identified correctly.
> > + */
> > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > +{
> > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > +				quirk_bridge_cavm_thrx2_pcie_root);
> > +
> > +/*
> >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> >   * class code.  Fix it.
> >   */

Thanks,
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-11  7:10       ` Jayachandran C
  (?)
@ 2017-04-11 13:41         ` Bjorn Helgaas
  -1 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:41 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

[+cc Joerg]

On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > Hi Jayachandran,
> > 
> > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > topology is slightly unusual. For a multi-node system, it looks like:
> > > 
> > > [node level PCI bridges - one per node]
> > >     [SoC PCI devices with MSI-X but no IOMMU]
> > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > >             [External PCI devices connected to PCIe links]
> > > 
> > > The top two levels of bridges should have introduced aliases since they
> > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > In the case of external PCIe devices, the "real" root ports are connected
> > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > node level bridges do not introduce an alias either.
> > > 
> > > To handle this quirk, we mark the real PCIe root ports and node level
> > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > SoC PCI devices.
> > > 
> > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > are from Broadcom Vulcan (14e4:90XX).
> > 
> > Can you supply some text here about why we want to apply this patch?
> > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > performance, avoid a crash, etc?
> 
> If this is for the commit message, I hope the following is ok:
> 
> "With this change, both MSI-X and IO virtualization work correctly on
> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> devices, and the IOMMU groups are setup correctly."

This doesn't get at what the actual problem is.  I'm hoping for
something like "without this change, we set up an IOMMU mapping for
requestor ID X, but device DMA uses requestor ID Y because ...., which
results in an IOMMU fault"

I've been puzzling over the fact that most of the callers of
pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
IOMMUs, domain_context_mapping() uses it to add a mapping for every
possible alias.  But most of the other callers only look at the last
alias and ignore all the others.  That might work most of the time,
but:

  - There's no guarantee that pci_for_each_dma_alias() iterates in any
    particular order, so relying on the current order is fragile,

  - The pci_add_dma_alias() interface allows an arbitrary number of
    aliases (as long as they're all on the same bus), and some devices
    do use more than one, e.g., quirk_dma_func0_alias(),
    quirk_mic_x200_dma_alias(),

  - pci_for_each_dma_alias() translates the rules in the PCIe to
    PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
    aliases.  I think it's important to pay attention to *every*
    possible alias, not just the last one.

I suspect the reason this patch makes a difference is because the
current pci_for_each_dma_alias() believes one of those top-level
bridges is an alias, and the iterator produces it last, so that's the
one you map.  The IOMMU is attached lower down, so that top-level
bridge is not in fact an alias, but since you only look at the *last*
one, you don't map the correct aliases from lower down in the tree.

Stopping the iterator earlier happens to make the last alias be one of
the correct ones, but it doesn't solve the problems of quirked devices
that can use multiple requester IDs, and it doesn't solve the problem
of PCIe-to-PCI bridges that optionally take ownership of transactions.

> I can send out a new patch if needed.
> 
> The on chip SATA and USB use MSI-X, so this is needed for basic
> functionality of the platform.

No need for a new patch; I can integrate something into the changelog.

> > > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > > ---
> > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..564a84a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > >  
> > >  /*
> > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > + * will ensure that the aliases are identified correctly.
> > > + */
> > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > +{
> > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > +}
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +
> > > +/*
> > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > >   * class code.  Fix it.
> > >   */
> 
> Thanks,
> JC.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 13:41         ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:41 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[+cc Joerg]

On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > Hi Jayachandran,
> > 
> > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > topology is slightly unusual. For a multi-node system, it looks like:
> > > 
> > > [node level PCI bridges - one per node]
> > >     [SoC PCI devices with MSI-X but no IOMMU]
> > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > >             [External PCI devices connected to PCIe links]
> > > 
> > > The top two levels of bridges should have introduced aliases since they
> > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > In the case of external PCIe devices, the "real" root ports are connected
> > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > node level bridges do not introduce an alias either.
> > > 
> > > To handle this quirk, we mark the real PCIe root ports and node level
> > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > SoC PCI devices.
> > > 
> > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > are from Broadcom Vulcan (14e4:90XX).
> > 
> > Can you supply some text here about why we want to apply this patch?
> > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > performance, avoid a crash, etc?
> 
> If this is for the commit message, I hope the following is ok:
> 
> "With this change, both MSI-X and IO virtualization work correctly on
> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> devices, and the IOMMU groups are setup correctly."

This doesn't get at what the actual problem is.  I'm hoping for
something like "without this change, we set up an IOMMU mapping for
requestor ID X, but device DMA uses requestor ID Y because ...., which
results in an IOMMU fault"

I've been puzzling over the fact that most of the callers of
pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
IOMMUs, domain_context_mapping() uses it to add a mapping for every
possible alias.  But most of the other callers only look at the last
alias and ignore all the others.  That might work most of the time,
but:

  - There's no guarantee that pci_for_each_dma_alias() iterates in any
    particular order, so relying on the current order is fragile,

  - The pci_add_dma_alias() interface allows an arbitrary number of
    aliases (as long as they're all on the same bus), and some devices
    do use more than one, e.g., quirk_dma_func0_alias(),
    quirk_mic_x200_dma_alias(),

  - pci_for_each_dma_alias() translates the rules in the PCIe to
    PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
    aliases.  I think it's important to pay attention to *every*
    possible alias, not just the last one.

I suspect the reason this patch makes a difference is because the
current pci_for_each_dma_alias() believes one of those top-level
bridges is an alias, and the iterator produces it last, so that's the
one you map.  The IOMMU is attached lower down, so that top-level
bridge is not in fact an alias, but since you only look at the *last*
one, you don't map the correct aliases from lower down in the tree.

Stopping the iterator earlier happens to make the last alias be one of
the correct ones, but it doesn't solve the problems of quirked devices
that can use multiple requester IDs, and it doesn't solve the problem
of PCIe-to-PCI bridges that optionally take ownership of transactions.

> I can send out a new patch if needed.
> 
> The on chip SATA and USB use MSI-X, so this is needed for basic
> functionality of the platform.

No need for a new patch; I can integrate something into the changelog.

> > > Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> > > ---
> > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..564a84a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > >  
> > >  /*
> > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > + * will ensure that the aliases are identified correctly.
> > > + */
> > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > +{
> > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > +}
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +
> > > +/*
> > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > >   * class code.  Fix it.
> > >   */
> 
> Thanks,
> JC.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 13:41         ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

[+cc Joerg]

On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > Hi Jayachandran,
> > 
> > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > topology is slightly unusual. For a multi-node system, it looks like:
> > > 
> > > [node level PCI bridges - one per node]
> > >     [SoC PCI devices with MSI-X but no IOMMU]
> > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > >             [External PCI devices connected to PCIe links]
> > > 
> > > The top two levels of bridges should have introduced aliases since they
> > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > In the case of external PCIe devices, the "real" root ports are connected
> > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > node level bridges do not introduce an alias either.
> > > 
> > > To handle this quirk, we mark the real PCIe root ports and node level
> > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > SoC PCI devices.
> > > 
> > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > are from Broadcom Vulcan (14e4:90XX).
> > 
> > Can you supply some text here about why we want to apply this patch?
> > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > performance, avoid a crash, etc?
> 
> If this is for the commit message, I hope the following is ok:
> 
> "With this change, both MSI-X and IO virtualization work correctly on
> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> devices, and the IOMMU groups are setup correctly."

This doesn't get at what the actual problem is.  I'm hoping for
something like "without this change, we set up an IOMMU mapping for
requestor ID X, but device DMA uses requestor ID Y because ...., which
results in an IOMMU fault"

I've been puzzling over the fact that most of the callers of
pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
IOMMUs, domain_context_mapping() uses it to add a mapping for every
possible alias.  But most of the other callers only look at the last
alias and ignore all the others.  That might work most of the time,
but:

  - There's no guarantee that pci_for_each_dma_alias() iterates in any
    particular order, so relying on the current order is fragile,

  - The pci_add_dma_alias() interface allows an arbitrary number of
    aliases (as long as they're all on the same bus), and some devices
    do use more than one, e.g., quirk_dma_func0_alias(),
    quirk_mic_x200_dma_alias(),

  - pci_for_each_dma_alias() translates the rules in the PCIe to
    PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
    aliases.  I think it's important to pay attention to *every*
    possible alias, not just the last one.

I suspect the reason this patch makes a difference is because the
current pci_for_each_dma_alias() believes one of those top-level
bridges is an alias, and the iterator produces it last, so that's the
one you map.  The IOMMU is attached lower down, so that top-level
bridge is not in fact an alias, but since you only look at the *last*
one, you don't map the correct aliases from lower down in the tree.

Stopping the iterator earlier happens to make the last alias be one of
the correct ones, but it doesn't solve the problems of quirked devices
that can use multiple requester IDs, and it doesn't solve the problem
of PCIe-to-PCI bridges that optionally take ownership of transactions.

> I can send out a new patch if needed.
> 
> The on chip SATA and USB use MSI-X, so this is needed for basic
> functionality of the platform.

No need for a new patch; I can integrate something into the changelog.

> > > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > > ---
> > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..564a84a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > >  
> > >  /*
> > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > + * will ensure that the aliases are identified correctly.
> > > + */
> > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > +{
> > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > +}
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > +
> > > +/*
> > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > >   * class code.  Fix it.
> > >   */
> 
> Thanks,
> JC.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 13:44   ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:44 UTC (permalink / raw)
  To: Jayachandran C
  Cc: David Daney, linux-pci, iommu, Alex Williamson, Jon Masters,
	Robin Murphy, linux-arm-kernel

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.

On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
> Hi Bjorn, Alex,
> 
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
> 
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
> 
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
> 
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
> 
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
> 
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>    outside x86)
>  - 4 of these can be reasonably handled (please see the github repo above),
>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>    But pci_for_each_dma_alias does not work as expected on this platform
>    and we have to be aware of that for all future uses of the function.
>   
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
> 
> v3->v4:
>  - new address of author
> 
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
> 
> Let me know your comments and suggestions.
> 
> Thanks,
> JC.
> 
> 
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
> 
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  drivers/pci/search.c |  4 ++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
> 
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 13:44   ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:44 UTC (permalink / raw)
  To: Jayachandran C
  Cc: David Daney, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.

On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
> Hi Bjorn, Alex,
> 
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
> 
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
> 
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
> 
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
> 
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
> 
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>    outside x86)
>  - 4 of these can be reasonably handled (please see the github repo above),
>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>    But pci_for_each_dma_alias does not work as expected on this platform
>    and we have to be aware of that for all future uses of the function.
>   
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
> 
> v3->v4:
>  - new address of author
> 
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
> 
> Let me know your comments and suggestions.
> 
> Thanks,
> JC.
> 
> 
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
> 
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  drivers/pci/search.c |  4 ++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
> 
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 13:44   ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.

On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
> Hi Bjorn, Alex,
> 
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
> 
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
> 
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
> 
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
> 
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
> 
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>    outside x86)
>  - 4 of these can be reasonably handled (please see the github repo above),
>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>    But pci_for_each_dma_alias does not work as expected on this platform
>    and we have to be aware of that for all future uses of the function.
>   
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
> 
> v3->v4:
>  - new address of author
> 
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
> 
> Let me know your comments and suggestions.
> 
> Thanks,
> JC.
> 
> 
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
> 
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  drivers/pci/search.c |  4 ++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
> 
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 14:23         ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 14:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robin Murphy, linux-pci, iommu, Alex Williamson, Jon Masters,
	David Daney, linux-arm-kernel, Jayachandran C

[-- Attachment #1: Type: text/plain, Size: 3128 bytes --]

On Apr 11, 2017 8:48 AM, "Bjorn Helgaas" <helgaas@kernel.org> wrote:

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.


Never mind this, Jon pointed out that ThunderX2 is different than
ThunderX.  Sorry for the noise, David.

On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
> Hi Bjorn, Alex,
>
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
>
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
>
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
>
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
>
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
>
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>    outside x86)
>  - 4 of these can be reasonably handled (please see the github repo
above),
>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>    But pci_for_each_dma_alias does not work as expected on this platform
>    and we have to be aware of that for all future uses of the function.
>
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
>
> v3->v4:
>  - new address of author
>
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
>
> Let me know your comments and suggestions.
>
> Thanks,
> JC.
>
>
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
>
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  drivers/pci/search.c |  4 ++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
>
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

[-- Attachment #2: Type: text/html, Size: 5190 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 14:23         ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-11 14:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: David Daney, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Jayachandran C, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 3217 bytes --]

On Apr 11, 2017 8:48 AM, "Bjorn Helgaas" <helgaas-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.


Never mind this, Jon pointed out that ThunderX2 is different than
ThunderX.  Sorry for the noise, David.

On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
> Hi Bjorn, Alex,
>
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
>
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
>
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
>
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
>
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
>
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>    outside x86)
>  - 4 of these can be reasonably handled (please see the github repo
above),
>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>    But pci_for_each_dma_alias does not work as expected on this platform
>    and we have to be aware of that for all future uses of the function.
>
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
>
> v3->v4:
>  - new address of author
>
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
>
> Let me know your comments and suggestions.
>
> Thanks,
> JC.
>
>
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
>
>  drivers/pci/quirks.c | 14 ++++++++++++++
>  drivers/pci/search.c |  4 ++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
>
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

[-- Attachment #1.2: Type: text/html, Size: 5308 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:27           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11 15:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, iommu, Alex Williamson, Jon Masters, Robin Murphy,
	linux-arm-kernel, Joerg Roedel

On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Jayachandran,
> > > 
> > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > 
> > > > [node level PCI bridges - one per node]
> > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > >             [External PCI devices connected to PCIe links]
> > > > 
> > > > The top two levels of bridges should have introduced aliases since they
> > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > node level bridges do not introduce an alias either.
> > > > 
> > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > SoC PCI devices.
> > > > 
> > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > are from Broadcom Vulcan (14e4:90XX).
> > > 
> > > Can you supply some text here about why we want to apply this patch?
> > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > performance, avoid a crash, etc?
> > 
> > If this is for the commit message, I hope the following is ok:
> > 
> > "With this change, both MSI-X and IO virtualization work correctly on
> > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

Ok. I hope this would be better:

"Without this change, the last alias seen while traversing the PCI
hierarchy will be used as the RID to generate the device ID for ITS
and stream ID for SMMU. This in turn causes the MSI-X generated by the
device to fail since the ITS expects to have translation tables based
on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
device DMA also fails when SMMU is enabled due to incorrect value in
SMMU translation tables"

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

pci_for_each_dma_alias() is used by the ARM code to find the RID
(Requester ID), and this is taken as the last alias as seen from the
PCI controller (RC). The RID is then used to program the Device ID
of the GIC ITS (ARM generic interrupt controller's interrupt translation
service) for MSI-X (and similarly to program Stream ID of the SMMU).

The translation from RID to Device ID or stream ID is provided by the
IORT ACPI table[1] or by the a {iommu,msi}-{map,mask} [2] property in
the device tree.

Taking the last alias maybe reasonable since the mapping is from
(PCI RC, RID) to (SMMU, streamID) or (GIC ITS, deviceID) and we are
looking for a single the RID for a device as seen from the controller.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.

Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.

In the case of Cavium ThunderX2, the RID which we should see on the RC
- if we follow the standard and factor in the aliasing introduced by the
PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
ITS).

But, if we stop the traversal at the point where SMMU (or ITS) is
attached, we will get the correct RID as seen by these.

> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.
 
If these happen below the point where the SMMU is attached, we will
consider the last alias introduced, which should be ok. If they are
above, the alias introduced is not relevant.  Devices with multiple
aliases is not handled anywhere in ARM code, so I don't think we should
consider that here.

> > I can send out a new patch if needed.
> > 
> > The on chip SATA and USB use MSI-X, so this is needed for basic
> > functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
> > > > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > > > ---
> > > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 6736836..564a84a 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > > >  
> > > >  /*
> > > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > > + * will ensure that the aliases are identified correctly.
> > > > + */
> > > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > > +{
> > > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > > +}
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +
> > > > +/*
> > > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > > >   * class code.  Fix it.
> > > >   */

Thanks,
JC.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
[2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:27           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11 15:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Jayachandran,
> > > 
> > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > 
> > > > [node level PCI bridges - one per node]
> > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > >             [External PCI devices connected to PCIe links]
> > > > 
> > > > The top two levels of bridges should have introduced aliases since they
> > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > node level bridges do not introduce an alias either.
> > > > 
> > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > SoC PCI devices.
> > > > 
> > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > are from Broadcom Vulcan (14e4:90XX).
> > > 
> > > Can you supply some text here about why we want to apply this patch?
> > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > performance, avoid a crash, etc?
> > 
> > If this is for the commit message, I hope the following is ok:
> > 
> > "With this change, both MSI-X and IO virtualization work correctly on
> > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

Ok. I hope this would be better:

"Without this change, the last alias seen while traversing the PCI
hierarchy will be used as the RID to generate the device ID for ITS
and stream ID for SMMU. This in turn causes the MSI-X generated by the
device to fail since the ITS expects to have translation tables based
on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
device DMA also fails when SMMU is enabled due to incorrect value in
SMMU translation tables"

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

pci_for_each_dma_alias() is used by the ARM code to find the RID
(Requester ID), and this is taken as the last alias as seen from the
PCI controller (RC). The RID is then used to program the Device ID
of the GIC ITS (ARM generic interrupt controller's interrupt translation
service) for MSI-X (and similarly to program Stream ID of the SMMU).

The translation from RID to Device ID or stream ID is provided by the
IORT ACPI table[1] or by the a {iommu,msi}-{map,mask} [2] property in
the device tree.

Taking the last alias maybe reasonable since the mapping is from
(PCI RC, RID) to (SMMU, streamID) or (GIC ITS, deviceID) and we are
looking for a single the RID for a device as seen from the controller.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.

Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.

In the case of Cavium ThunderX2, the RID which we should see on the RC
- if we follow the standard and factor in the aliasing introduced by the
PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
ITS).

But, if we stop the traversal at the point where SMMU (or ITS) is
attached, we will get the correct RID as seen by these.

> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.
 
If these happen below the point where the SMMU is attached, we will
consider the last alias introduced, which should be ok. If they are
above, the alias introduced is not relevant.  Devices with multiple
aliases is not handled anywhere in ARM code, so I don't think we should
consider that here.

> > I can send out a new patch if needed.
> > 
> > The on chip SATA and USB use MSI-X, so this is needed for basic
> > functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
> > > > Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> > > > ---
> > > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 6736836..564a84a 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > > >  
> > > >  /*
> > > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > > + * will ensure that the aliases are identified correctly.
> > > > + */
> > > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > > +{
> > > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > > +}
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +
> > > > +/*
> > > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > > >   * class code.  Fix it.
> > > >   */

Thanks,
JC.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
[2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:27           ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-11 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Jayachandran,
> > > 
> > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > 
> > > > [node level PCI bridges - one per node]
> > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > >             [External PCI devices connected to PCIe links]
> > > > 
> > > > The top two levels of bridges should have introduced aliases since they
> > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > node level bridges do not introduce an alias either.
> > > > 
> > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > SoC PCI devices.
> > > > 
> > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > are from Broadcom Vulcan (14e4:90XX).
> > > 
> > > Can you supply some text here about why we want to apply this patch?
> > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > performance, avoid a crash, etc?
> > 
> > If this is for the commit message, I hope the following is ok:
> > 
> > "With this change, both MSI-X and IO virtualization work correctly on
> > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

Ok. I hope this would be better:

"Without this change, the last alias seen while traversing the PCI
hierarchy will be used as the RID to generate the device ID for ITS
and stream ID for SMMU. This in turn causes the MSI-X generated by the
device to fail since the ITS expects to have translation tables based
on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
device DMA also fails when SMMU is enabled due to incorrect value in
SMMU translation tables"

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

pci_for_each_dma_alias() is used by the ARM code to find the RID
(Requester ID), and this is taken as the last alias as seen from the
PCI controller (RC). The RID is then used to program the Device ID
of the GIC ITS (ARM generic interrupt controller's interrupt translation
service) for MSI-X (and similarly to program Stream ID of the SMMU).

The translation from RID to Device ID or stream ID is provided by the
IORT ACPI table[1] or by the a {iommu,msi}-{map,mask} [2] property in
the device tree.

Taking the last alias maybe reasonable since the mapping is from
(PCI RC, RID) to (SMMU, streamID) or (GIC ITS, deviceID) and we are
looking for a single the RID for a device as seen from the controller.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.

Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.

In the case of Cavium ThunderX2, the RID which we should see on the RC
- if we follow the standard and factor in the aliasing introduced by the
PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
ITS).

But, if we stop the traversal at the point where SMMU (or ITS) is
attached, we will get the correct RID as seen by these.

> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.
 
If these happen below the point where the SMMU is attached, we will
consider the last alias introduced, which should be ok. If they are
above, the alias introduced is not relevant.  Devices with multiple
aliases is not handled anywhere in ARM code, so I don't think we should
consider that here.

> > I can send out a new patch if needed.
> > 
> > The on chip SATA and USB use MSI-X, so this is needed for basic
> > functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
> > > > Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
> > > > ---
> > > >  drivers/pci/quirks.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 6736836..564a84a 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
> > > >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
> > > >  
> > > >  /*
> > > > + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> > > > + * associated not at the root bus, but at a bridge below. This quirk flag
> > > > + * will ensure that the aliases are identified correctly.
> > > > + */
> > > > +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> > > > +{
> > > > +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> > > > +}
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> > > > +				quirk_bridge_cavm_thrx2_pcie_root);
> > > > +
> > > > +/*
> > > >   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
> > > >   * class code.  Fix it.
> > > >   */

Thanks,
JC.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
[2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:34           ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-11 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Jayachandran C
  Cc: linux-pci, iommu, Alex Williamson, Jon Masters, linux-arm-kernel,
	Joerg Roedel, Marc Zyngier

On 11/04/17 14:41, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
>> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
>>> Hi Jayachandran,
>>>
>>> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
>>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>>
>>>> [node level PCI bridges - one per node]
>>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>>             [External PCI devices connected to PCIe links]
>>>>
>>>> The top two levels of bridges should have introduced aliases since they
>>>> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
>>>> In the case of external PCIe devices, the "real" root ports are connected
>>>> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
>>>> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
>>>> node level bridges do not introduce an alias either.
>>>>
>>>> To handle this quirk, we mark the real PCIe root ports and node level
>>>> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
>>>> pci_for_each_dma_alias() works correctly for external PCIe devices and
>>>> SoC PCI devices.
>>>>
>>>> For the current revision of Cavium ThunderX2, the VendorID and Device ID
>>>> are from Broadcom Vulcan (14e4:90XX).
>>>
>>> Can you supply some text here about why we want to apply this patch?
>>> E.g., does it avoid making unnecessary IOMMU mappings, improve
>>> performance, avoid a crash, etc?
>>
>> If this is for the commit message, I hope the following is ok:
>>
>> "With this change, both MSI-X and IO virtualization work correctly on
>> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
>> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
>> devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

The fact that certain bits of code only consider the last-seen alias RID
does mean we will install all mappings for a RID the IOMMU will never
see, causing subsequent attempt to access them (via the real RID) to
fault. Even with that fixed, considering the past-the-IOMMU-connection
bridges will cause IOMMU group assignment to believe the IOMMU can't
distinguish them and thus prevent individual devices being assignable to
different VMs.

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.

As it happens, I've just been looking into this and reaching the same
conclusion (not least because a fair few of of the incorrect uses have
my fingerprints all over them). The of_iommu_configure() and
corresponding iort_iommu_configure() uses turn out to already be broken
WRT DMA phantom functions vs. MSIs, but pretty straightforward to fix
(and I now have one of the offending Marvell SATA cards to test things
with). The one in its_pci_msi_prepare() turns out to be totally
backwards, as it actually wants to iterate through every device which
may alias with the given device (any suggestions for the neatest way to
do that are most welcome).

The other uses in pci_msi_*() are a little trickier, as in general we're
assuming every device is associated with just a single ID (certainly for
ARM GICv3 this assumption runs quite strongly through the architecture)
- Marc and I have chucked some ideas around, but in the short term, it
might be easier to just not even pretend to support MSIs from behind
aliasing bridges for the DT/IORT cases. Either way, those are also
broken for the phantom function case (and the tentative fix I've written
will need rethinking in light of this discussion, oh well).

>  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

Ha, that exact page is still open on my desktop since the "oh crap!"
moment last Friday :)

If I've interpreted that spec correctly, and it definitely is the case
that a bridge may alias or not on a per-transaction basis, then that
does end up making matters simpler; I can remove all the attempts to
skip any IDs that the IOMMU is guaranteed never to see due to aliasing,
if that set is in fact empty.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.
> 
> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.

Yes, that's pretty much the state of things, other than also solving the
legitimate problem of get_pci_alias_group() going too far and inferring
a false lack of isolation.

Robin.

>> I can send out a new patch if needed.
>>
>> The on chip SATA and USB use MSI-X, so this is needed for basic
>> functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
>>>> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
>>>> ---
>>>>  drivers/pci/quirks.c | 14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 6736836..564a84a 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>>>>  
>>>>  /*
>>>> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
>>>> + * associated not at the root bus, but at a bridge below. This quirk flag
>>>> + * will ensure that the aliases are identified correctly.
>>>> + */
>>>> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
>>>> +{
>>>> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
>>>> +}
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +
>>>> +/*
>>>>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>>>>   * class code.  Fix it.
>>>>   */
>>
>> Thanks,
>> JC.
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:34           ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-11 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Jayachandran C
  Cc: Marc Zyngier, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 11/04/17 14:41, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
>> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
>>> Hi Jayachandran,
>>>
>>> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
>>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>>
>>>> [node level PCI bridges - one per node]
>>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>>             [External PCI devices connected to PCIe links]
>>>>
>>>> The top two levels of bridges should have introduced aliases since they
>>>> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
>>>> In the case of external PCIe devices, the "real" root ports are connected
>>>> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
>>>> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
>>>> node level bridges do not introduce an alias either.
>>>>
>>>> To handle this quirk, we mark the real PCIe root ports and node level
>>>> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
>>>> pci_for_each_dma_alias() works correctly for external PCIe devices and
>>>> SoC PCI devices.
>>>>
>>>> For the current revision of Cavium ThunderX2, the VendorID and Device ID
>>>> are from Broadcom Vulcan (14e4:90XX).
>>>
>>> Can you supply some text here about why we want to apply this patch?
>>> E.g., does it avoid making unnecessary IOMMU mappings, improve
>>> performance, avoid a crash, etc?
>>
>> If this is for the commit message, I hope the following is ok:
>>
>> "With this change, both MSI-X and IO virtualization work correctly on
>> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
>> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
>> devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

The fact that certain bits of code only consider the last-seen alias RID
does mean we will install all mappings for a RID the IOMMU will never
see, causing subsequent attempt to access them (via the real RID) to
fault. Even with that fixed, considering the past-the-IOMMU-connection
bridges will cause IOMMU group assignment to believe the IOMMU can't
distinguish them and thus prevent individual devices being assignable to
different VMs.

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.

As it happens, I've just been looking into this and reaching the same
conclusion (not least because a fair few of of the incorrect uses have
my fingerprints all over them). The of_iommu_configure() and
corresponding iort_iommu_configure() uses turn out to already be broken
WRT DMA phantom functions vs. MSIs, but pretty straightforward to fix
(and I now have one of the offending Marvell SATA cards to test things
with). The one in its_pci_msi_prepare() turns out to be totally
backwards, as it actually wants to iterate through every device which
may alias with the given device (any suggestions for the neatest way to
do that are most welcome).

The other uses in pci_msi_*() are a little trickier, as in general we're
assuming every device is associated with just a single ID (certainly for
ARM GICv3 this assumption runs quite strongly through the architecture)
- Marc and I have chucked some ideas around, but in the short term, it
might be easier to just not even pretend to support MSIs from behind
aliasing bridges for the DT/IORT cases. Either way, those are also
broken for the phantom function case (and the tentative fix I've written
will need rethinking in light of this discussion, oh well).

>  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

Ha, that exact page is still open on my desktop since the "oh crap!"
moment last Friday :)

If I've interpreted that spec correctly, and it definitely is the case
that a bridge may alias or not on a per-transaction basis, then that
does end up making matters simpler; I can remove all the attempts to
skip any IDs that the IOMMU is guaranteed never to see due to aliasing,
if that set is in fact empty.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.
> 
> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.

Yes, that's pretty much the state of things, other than also solving the
legitimate problem of get_pci_alias_group() going too far and inferring
a false lack of isolation.

Robin.

>> I can send out a new patch if needed.
>>
>> The on chip SATA and USB use MSI-X, so this is needed for basic
>> functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
>>>> Signed-off-by: Jayachandran C <jnair-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>>> ---
>>>>  drivers/pci/quirks.c | 14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 6736836..564a84a 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>>>>  
>>>>  /*
>>>> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
>>>> + * associated not at the root bus, but at a bridge below. This quirk flag
>>>> + * will ensure that the aliases are identified correctly.
>>>> + */
>>>> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
>>>> +{
>>>> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
>>>> +}
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +
>>>> +/*
>>>>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>>>>   * class code.  Fix it.
>>>>   */
>>
>> Thanks,
>> JC.
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:34           ` Robin Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Robin Murphy @ 2017-04-11 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/04/17 14:41, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
>> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
>>> Hi Jayachandran,
>>>
>>> On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
>>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>>
>>>> [node level PCI bridges - one per node]
>>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>>             [External PCI devices connected to PCIe links]
>>>>
>>>> The top two levels of bridges should have introduced aliases since they
>>>> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
>>>> In the case of external PCIe devices, the "real" root ports are connected
>>>> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
>>>> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
>>>> node level bridges do not introduce an alias either.
>>>>
>>>> To handle this quirk, we mark the real PCIe root ports and node level
>>>> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
>>>> pci_for_each_dma_alias() works correctly for external PCIe devices and
>>>> SoC PCI devices.
>>>>
>>>> For the current revision of Cavium ThunderX2, the VendorID and Device ID
>>>> are from Broadcom Vulcan (14e4:90XX).
>>>
>>> Can you supply some text here about why we want to apply this patch?
>>> E.g., does it avoid making unnecessary IOMMU mappings, improve
>>> performance, avoid a crash, etc?
>>
>> If this is for the commit message, I hope the following is ok:
>>
>> "With this change, both MSI-X and IO virtualization work correctly on
>> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
>> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
>> devices, and the IOMMU groups are setup correctly."
> 
> This doesn't get at what the actual problem is.  I'm hoping for
> something like "without this change, we set up an IOMMU mapping for
> requestor ID X, but device DMA uses requestor ID Y because ...., which
> results in an IOMMU fault"

The fact that certain bits of code only consider the last-seen alias RID
does mean we will install all mappings for a RID the IOMMU will never
see, causing subsequent attempt to access them (via the real RID) to
fault. Even with that fixed, considering the past-the-IOMMU-connection
bridges will cause IOMMU group assignment to believe the IOMMU can't
distinguish them and thus prevent individual devices being assignable to
different VMs.

> I've been puzzling over the fact that most of the callers of
> pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
> IOMMUs, domain_context_mapping() uses it to add a mapping for every
> possible alias.  But most of the other callers only look at the last
> alias and ignore all the others.

As it happens, I've just been looking into this and reaching the same
conclusion (not least because a fair few of of the incorrect uses have
my fingerprints all over them). The of_iommu_configure() and
corresponding iort_iommu_configure() uses turn out to already be broken
WRT DMA phantom functions vs. MSIs, but pretty straightforward to fix
(and I now have one of the offending Marvell SATA cards to test things
with). The one in its_pci_msi_prepare() turns out to be totally
backwards, as it actually wants to iterate through every device which
may alias with the given device (any suggestions for the neatest way to
do that are most welcome).

The other uses in pci_msi_*() are a little trickier, as in general we're
assuming every device is associated with just a single ID (certainly for
ARM GICv3 this assumption runs quite strongly through the architecture)
- Marc and I have chucked some ideas around, but in the short term, it
might be easier to just not even pretend to support MSIs from behind
aliasing bridges for the DT/IORT cases. Either way, those are also
broken for the phantom function case (and the tentative fix I've written
will need rethinking in light of this discussion, oh well).

>  That might work most of the time,
> but:
> 
>   - There's no guarantee that pci_for_each_dma_alias() iterates in any
>     particular order, so relying on the current order is fragile,
> 
>   - The pci_add_dma_alias() interface allows an arbitrary number of
>     aliases (as long as they're all on the same bus), and some devices
>     do use more than one, e.g., quirk_dma_func0_alias(),
>     quirk_mic_x200_dma_alias(),
> 
>   - pci_for_each_dma_alias() translates the rules in the PCIe to
>     PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
>     aliases.  I think it's important to pay attention to *every*
>     possible alias, not just the last one.

Ha, that exact page is still open on my desktop since the "oh crap!"
moment last Friday :)

If I've interpreted that spec correctly, and it definitely is the case
that a bridge may alias or not on a per-transaction basis, then that
does end up making matters simpler; I can remove all the attempts to
skip any IDs that the IOMMU is guaranteed never to see due to aliasing,
if that set is in fact empty.

> I suspect the reason this patch makes a difference is because the
> current pci_for_each_dma_alias() believes one of those top-level
> bridges is an alias, and the iterator produces it last, so that's the
> one you map.  The IOMMU is attached lower down, so that top-level
> bridge is not in fact an alias, but since you only look at the *last*
> one, you don't map the correct aliases from lower down in the tree.
> 
> Stopping the iterator earlier happens to make the last alias be one of
> the correct ones, but it doesn't solve the problems of quirked devices
> that can use multiple requester IDs, and it doesn't solve the problem
> of PCIe-to-PCI bridges that optionally take ownership of transactions.

Yes, that's pretty much the state of things, other than also solving the
legitimate problem of get_pci_alias_group() going too far and inferring
a false lack of isolation.

Robin.

>> I can send out a new patch if needed.
>>
>> The on chip SATA and USB use MSI-X, so this is needed for basic
>> functionality of the platform.
> 
> No need for a new patch; I can integrate something into the changelog.
> 
>>>> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
>>>> ---
>>>>  drivers/pci/quirks.c | 14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 6736836..564a84a 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias);
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias);
>>>>  
>>>>  /*
>>>> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
>>>> + * associated not at the root bus, but at a bridge below. This quirk flag
>>>> + * will ensure that the aliases are identified correctly.
>>>> + */
>>>> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
>>>> +{
>>>> +	pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
>>>> +}
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
>>>> +				quirk_bridge_cavm_thrx2_pcie_root);
>>>> +
>>>> +/*
>>>>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
>>>>   * class code.  Fix it.
>>>>   */
>>
>> Thanks,
>> JC.
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-11 15:27           ` Jayachandran C
@ 2017-04-11 15:43             ` Jon Masters
  -1 siblings, 0 replies; 64+ messages in thread
From: Jon Masters @ 2017-04-11 15:43 UTC (permalink / raw)
  To: Jayachandran C, Bjorn Helgaas
  Cc: linux-pci, iommu, Alex Williamson, Robin Murphy,
	linux-arm-kernel, Joerg Roedel

On 04/11/2017 11:27 AM, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:

>> I suspect the reason this patch makes a difference is because the
>> current pci_for_each_dma_alias() believes one of those top-level
>> bridges is an alias, and the iterator produces it last, so that's the
>> one you map.  The IOMMU is attached lower down, so that top-level
>> bridge is not in fact an alias, but since you only look at the *last*
>> one, you don't map the correct aliases from lower down in the tree.
> 
> Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> 
> In the case of Cavium ThunderX2, the RID which we should see on the RC
> - if we follow the standard and factor in the aliasing introduced by the
> PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> ITS).
> 
> But, if we stop the traversal at the point where SMMU (or ITS) is
> attached, we will get the correct RID as seen by these.

Side note that I am trying to get various specifications clarified to
promote more of a familiar alternative architecture (x86) approach in
the future in which these aren't at different levels in the topology.
But to do that requires integrated Root Complex IP with bells/whistles.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-11 15:43             ` Jon Masters
  0 siblings, 0 replies; 64+ messages in thread
From: Jon Masters @ 2017-04-11 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11/2017 11:27 AM, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:

>> I suspect the reason this patch makes a difference is because the
>> current pci_for_each_dma_alias() believes one of those top-level
>> bridges is an alias, and the iterator produces it last, so that's the
>> one you map.  The IOMMU is attached lower down, so that top-level
>> bridge is not in fact an alias, but since you only look at the *last*
>> one, you don't map the correct aliases from lower down in the tree.
> 
> Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> 
> In the case of Cavium ThunderX2, the RID which we should see on the RC
> - if we follow the standard and factor in the aliasing introduced by the
> PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> ITS).
> 
> But, if we stop the traversal at the point where SMMU (or ITS) is
> attached, we will get the correct RID as seen by these.

Side note that I am trying to get various specifications clarified to
promote more of a familiar alternative architecture (x86) approach in
the future in which these aren't at different levels in the topology.
But to do that requires integrated Root Complex IP with bells/whistles.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
  2017-04-11 13:44   ` Bjorn Helgaas
@ 2017-04-11 16:01     ` David Daney
  -1 siblings, 0 replies; 64+ messages in thread
From: David Daney @ 2017-04-11 16:01 UTC (permalink / raw)
  To: Bjorn Helgaas, Jayachandran C
  Cc: David Daney, linux-pci, iommu, Alex Williamson, Jon Masters,
	Robin Murphy, linux-arm-kernel

On 04/11/2017 06:44 AM, Bjorn Helgaas wrote:
> [+cc David]
>
> I forgot to mention that I'm also hoping for an ack from David, since
> he's listed as the maintainer of the ThunderX drivers.
>

JC is really leading the development of this particular PCI 
implementation, but I am happy to supply my:

Acked-by: David Daney <david.daney@cavium.com>



> On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
>> Hi Bjorn, Alex,
>>
>> Sending this again (with a trivial fix to author name), please review.
>> Updated summary below:
>>
>> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
>> ThunderX2 systems (previously known as Broadcom Vulcan).
>>
>> The earlier discussions on this can be seen at:
>> http://www.spinics.net/lists/linux-pci/msg51001.html
>> https://patchwork.ozlabs.org/patch/582633/ and
>> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
>>
>> The earlier discussion on this patchset ended with a suggestion that it
>> may be possible to fix up this quirk by handling the issue in the
>> function argument of pci_for_each_dma_alias(). But at that point we did
>> not have the codebase to make the changes since the full ACPI and OF code
>> for SMMU and GIC ITS was not upstream.
>>
>> Now that the changes are upstream, I tried to fix it in both the SMMU
>> and the GIC ITS code based on this suggestion, the changes needed are at:
>>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
>>
>> The problems with this approach are:
>>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>>    outside x86)
>>  - 4 of these can be reasonably handled (please see the github repo above),
>>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>>  - Even without the 2 above two changes I can get it to work for now.
>>    But pci_for_each_dma_alias does not work as expected on this platform
>>    and we have to be aware of that for all future uses of the function.
>>
>> For now, I have ruled out the approach, and I have rebased the earlier
>> patch on to 4.11-rc and submitting again for review. The changes are:
>>
>> v3->v4:
>>  - new address of author
>>
>> v2>v3:
>>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>>  - updated commit message to make the quirk clearer.
>>
>> Let me know your comments and suggestions.
>>
>> Thanks,
>> JC.
>>
>>
>> Jayachandran C (2):
>>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>>   PCI: quirks: Fix ThunderX2 dma alias handling
>>
>>  drivers/pci/quirks.c | 14 ++++++++++++++
>>  drivers/pci/search.c |  4 ++++
>>  include/linux/pci.h  |  2 ++
>>  3 files changed, 20 insertions(+)
>>
>> --
>> 2.7.4
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk
@ 2017-04-11 16:01     ` David Daney
  0 siblings, 0 replies; 64+ messages in thread
From: David Daney @ 2017-04-11 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/11/2017 06:44 AM, Bjorn Helgaas wrote:
> [+cc David]
>
> I forgot to mention that I'm also hoping for an ack from David, since
> he's listed as the maintainer of the ThunderX drivers.
>

JC is really leading the development of this particular PCI 
implementation, but I am happy to supply my:

Acked-by: David Daney <david.daney@cavium.com>



> On Mon, Apr 03, 2017 at 01:15:02PM +0000, Jayachandran C wrote:
>> Hi Bjorn, Alex,
>>
>> Sending this again (with a trivial fix to author name), please review.
>> Updated summary below:
>>
>> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
>> ThunderX2 systems (previously known as Broadcom Vulcan).
>>
>> The earlier discussions on this can be seen at:
>> http://www.spinics.net/lists/linux-pci/msg51001.html
>> https://patchwork.ozlabs.org/patch/582633/ and
>> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
>>
>> The earlier discussion on this patchset ended with a suggestion that it
>> may be possible to fix up this quirk by handling the issue in the
>> function argument of pci_for_each_dma_alias(). But at that point we did
>> not have the codebase to make the changes since the full ACPI and OF code
>> for SMMU and GIC ITS was not upstream.
>>
>> Now that the changes are upstream, I tried to fix it in both the SMMU
>> and the GIC ITS code based on this suggestion, the changes needed are at:
>>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
>>
>> The problems with this approach are:
>>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>>    tree, I have to fixup 6 callers (which is all but one ofthe callers
>>    outside x86)
>>  - 4 of these can be reasonably handled (please see the github repo above),
>>    but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>>    drivers/iommu/iommu.c cannot be reasonably fixed up.
>>  - Even without the 2 above two changes I can get it to work for now.
>>    But pci_for_each_dma_alias does not work as expected on this platform
>>    and we have to be aware of that for all future uses of the function.
>>
>> For now, I have ruled out the approach, and I have rebased the earlier
>> patch on to 4.11-rc and submitting again for review. The changes are:
>>
>> v3->v4:
>>  - new address of author
>>
>> v2>v3:
>>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>>    PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>>  - updated commit message to make the quirk clearer.
>>
>> Let me know your comments and suggestions.
>>
>> Thanks,
>> JC.
>>
>>
>> Jayachandran C (2):
>>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>>   PCI: quirks: Fix ThunderX2 dma alias handling
>>
>>  drivers/pci/quirks.c | 14 ++++++++++++++
>>  drivers/pci/search.c |  4 ++++
>>  include/linux/pci.h  |  2 ++
>>  3 files changed, 20 insertions(+)
>>
>> --
>> 2.7.4
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-11 15:27           ` Jayachandran C
  (?)
@ 2017-04-12 16:21             ` Bjorn Helgaas
  -1 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 16:21 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > [+cc Joerg]
> > 
> > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > Hi Jayachandran,
> > > > 
> > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > 
> > > > > [node level PCI bridges - one per node]
> > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > >             [External PCI devices connected to PCIe links]
> > > > > 
> > > > > The top two levels of bridges should have introduced aliases since they
> > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > node level bridges do not introduce an alias either.
> > > > > 
> > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > SoC PCI devices.
> > > > > 
> > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > 
> > > > Can you supply some text here about why we want to apply this patch?
> > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > performance, avoid a crash, etc?
> > > 
> > > If this is for the commit message, I hope the following is ok:
> > > 
> > > "With this change, both MSI-X and IO virtualization work correctly on
> > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > devices, and the IOMMU groups are setup correctly."
> > 
> > This doesn't get at what the actual problem is.  I'm hoping for
> > something like "without this change, we set up an IOMMU mapping for
> > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > results in an IOMMU fault"
> 
> Ok. I hope this would be better:
> 
> "Without this change, the last alias seen while traversing the PCI
> hierarchy will be used as the RID to generate the device ID for ITS
> and stream ID for SMMU. This in turn causes the MSI-X generated by the
> device to fail since the ITS expects to have translation tables based
> on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> device DMA also fails when SMMU is enabled due to incorrect value in
> SMMU translation tables"

This description is true, but I don't think it addresses the real
problem.  I think the real problem is that your IOMMU code doesn't
handle aliases correctly, and by ignoring these invalid aliases, we
happen to map an alias that works for the builtin devices.  But that's
only because we got lucky (those devices use a single RID and they're
not behind bridges that optionally take ownership).

It would make sense to me if we fixed the IOMMU code to map *all* the
aliases, which should be enough to make your devices work.  If we then
wanted to apply a patch like this on top, it would be simply an
optimization that avoids unnecessary IOMMU mappings.

> > I suspect the reason this patch makes a difference is because the
> > current pci_for_each_dma_alias() believes one of those top-level
> > bridges is an alias, and the iterator produces it last, so that's the
> > one you map.  The IOMMU is attached lower down, so that top-level
> > bridge is not in fact an alias, but since you only look at the *last*
> > one, you don't map the correct aliases from lower down in the tree.
> 
> Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> 
> In the case of Cavium ThunderX2, the RID which we should see on the RC
> - if we follow the standard and factor in the aliasing introduced by the
> PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> ITS).
> 
> But, if we stop the traversal at the point where SMMU (or ITS) is
> attached, we will get the correct RID as seen by these.

There is a single "correct RID" for your builtin SATA and USB, but in
general there is no single RID.

> > Stopping the iterator earlier happens to make the last alias be one of
> > the correct ones, but it doesn't solve the problems of quirked devices
> > that can use multiple requester IDs, and it doesn't solve the problem
> > of PCIe-to-PCI bridges that optionally take ownership of transactions.
>  
> If these happen below the point where the SMMU is attached, we will
> consider the last alias introduced, which should be ok. If they are
> above, the alias introduced is not relevant.  Devices with multiple
> aliases is not handled anywhere in ARM code, so I don't think we should
> consider that here.

I think we *should* consider it here.  The multiple alias situation is
generic PCIe, independent of ARM.  If you want to support arbitrary
PCIe plugin devices, you have to handle it.  I think any device with a
quirk that calls pci_add_dma_alias() will currently fail on your
system.  And I think devices behind a bridge that optionally takes
ownership of DMA transactions will also fail.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 16:21             ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 16:21 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, iommu, Alex Williamson, Jon Masters, Robin Murphy,
	linux-arm-kernel, Joerg Roedel

On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > [+cc Joerg]
> > 
> > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > Hi Jayachandran,
> > > > 
> > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > 
> > > > > [node level PCI bridges - one per node]
> > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > >             [External PCI devices connected to PCIe links]
> > > > > 
> > > > > The top two levels of bridges should have introduced aliases since they
> > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > node level bridges do not introduce an alias either.
> > > > > 
> > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > SoC PCI devices.
> > > > > 
> > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > 
> > > > Can you supply some text here about why we want to apply this patch?
> > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > performance, avoid a crash, etc?
> > > 
> > > If this is for the commit message, I hope the following is ok:
> > > 
> > > "With this change, both MSI-X and IO virtualization work correctly on
> > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > devices, and the IOMMU groups are setup correctly."
> > 
> > This doesn't get at what the actual problem is.  I'm hoping for
> > something like "without this change, we set up an IOMMU mapping for
> > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > results in an IOMMU fault"
> 
> Ok. I hope this would be better:
> 
> "Without this change, the last alias seen while traversing the PCI
> hierarchy will be used as the RID to generate the device ID for ITS
> and stream ID for SMMU. This in turn causes the MSI-X generated by the
> device to fail since the ITS expects to have translation tables based
> on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> device DMA also fails when SMMU is enabled due to incorrect value in
> SMMU translation tables"

This description is true, but I don't think it addresses the real
problem.  I think the real problem is that your IOMMU code doesn't
handle aliases correctly, and by ignoring these invalid aliases, we
happen to map an alias that works for the builtin devices.  But that's
only because we got lucky (those devices use a single RID and they're
not behind bridges that optionally take ownership).

It would make sense to me if we fixed the IOMMU code to map *all* the
aliases, which should be enough to make your devices work.  If we then
wanted to apply a patch like this on top, it would be simply an
optimization that avoids unnecessary IOMMU mappings.

> > I suspect the reason this patch makes a difference is because the
> > current pci_for_each_dma_alias() believes one of those top-level
> > bridges is an alias, and the iterator produces it last, so that's the
> > one you map.  The IOMMU is attached lower down, so that top-level
> > bridge is not in fact an alias, but since you only look at the *last*
> > one, you don't map the correct aliases from lower down in the tree.
> 
> Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> 
> In the case of Cavium ThunderX2, the RID which we should see on the RC
> - if we follow the standard and factor in the aliasing introduced by the
> PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> ITS).
> 
> But, if we stop the traversal at the point where SMMU (or ITS) is
> attached, we will get the correct RID as seen by these.

There is a single "correct RID" for your builtin SATA and USB, but in
general there is no single RID.

> > Stopping the iterator earlier happens to make the last alias be one of
> > the correct ones, but it doesn't solve the problems of quirked devices
> > that can use multiple requester IDs, and it doesn't solve the problem
> > of PCIe-to-PCI bridges that optionally take ownership of transactions.
>  
> If these happen below the point where the SMMU is attached, we will
> consider the last alias introduced, which should be ok. If they are
> above, the alias introduced is not relevant.  Devices with multiple
> aliases is not handled anywhere in ARM code, so I don't think we should
> consider that here.

I think we *should* consider it here.  The multiple alias situation is
generic PCIe, independent of ARM.  If you want to support arbitrary
PCIe plugin devices, you have to handle it.  I think any device with a
quirk that calls pci_add_dma_alias() will currently fail on your
system.  And I think devices behind a bridge that optionally takes
ownership of DMA transactions will also fail.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 16:21             ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > [+cc Joerg]
> > 
> > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > Hi Jayachandran,
> > > > 
> > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > 
> > > > > [node level PCI bridges - one per node]
> > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > >             [External PCI devices connected to PCIe links]
> > > > > 
> > > > > The top two levels of bridges should have introduced aliases since they
> > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > node level bridges do not introduce an alias either.
> > > > > 
> > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > SoC PCI devices.
> > > > > 
> > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > 
> > > > Can you supply some text here about why we want to apply this patch?
> > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > performance, avoid a crash, etc?
> > > 
> > > If this is for the commit message, I hope the following is ok:
> > > 
> > > "With this change, both MSI-X and IO virtualization work correctly on
> > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > devices, and the IOMMU groups are setup correctly."
> > 
> > This doesn't get at what the actual problem is.  I'm hoping for
> > something like "without this change, we set up an IOMMU mapping for
> > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > results in an IOMMU fault"
> 
> Ok. I hope this would be better:
> 
> "Without this change, the last alias seen while traversing the PCI
> hierarchy will be used as the RID to generate the device ID for ITS
> and stream ID for SMMU. This in turn causes the MSI-X generated by the
> device to fail since the ITS expects to have translation tables based
> on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> device DMA also fails when SMMU is enabled due to incorrect value in
> SMMU translation tables"

This description is true, but I don't think it addresses the real
problem.  I think the real problem is that your IOMMU code doesn't
handle aliases correctly, and by ignoring these invalid aliases, we
happen to map an alias that works for the builtin devices.  But that's
only because we got lucky (those devices use a single RID and they're
not behind bridges that optionally take ownership).

It would make sense to me if we fixed the IOMMU code to map *all* the
aliases, which should be enough to make your devices work.  If we then
wanted to apply a patch like this on top, it would be simply an
optimization that avoids unnecessary IOMMU mappings.

> > I suspect the reason this patch makes a difference is because the
> > current pci_for_each_dma_alias() believes one of those top-level
> > bridges is an alias, and the iterator produces it last, so that's the
> > one you map.  The IOMMU is attached lower down, so that top-level
> > bridge is not in fact an alias, but since you only look at the *last*
> > one, you don't map the correct aliases from lower down in the tree.
> 
> Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> 
> In the case of Cavium ThunderX2, the RID which we should see on the RC
> - if we follow the standard and factor in the aliasing introduced by the
> PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> ITS).
> 
> But, if we stop the traversal at the point where SMMU (or ITS) is
> attached, we will get the correct RID as seen by these.

There is a single "correct RID" for your builtin SATA and USB, but in
general there is no single RID.

> > Stopping the iterator earlier happens to make the last alias be one of
> > the correct ones, but it doesn't solve the problems of quirked devices
> > that can use multiple requester IDs, and it doesn't solve the problem
> > of PCIe-to-PCI bridges that optionally take ownership of transactions.
>  
> If these happen below the point where the SMMU is attached, we will
> consider the last alias introduced, which should be ok. If they are
> above, the alias introduced is not relevant.  Devices with multiple
> aliases is not handled anywhere in ARM code, so I don't think we should
> consider that here.

I think we *should* consider it here.  The multiple alias situation is
generic PCIe, independent of ARM.  If you want to support arbitrary
PCIe plugin devices, you have to handle it.  I think any device with a
quirk that calls pci_add_dma_alias() will currently fail on your
system.  And I think devices behind a bridge that optionally takes
ownership of DMA transactions will also fail.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 18:10               ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 18:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > [+cc Joerg]
> > > 
> > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > Hi Jayachandran,
> > > > > 
> > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > 
> > > > > > [node level PCI bridges - one per node]
> > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > >             [External PCI devices connected to PCIe links]
> > > > > > 
> > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > node level bridges do not introduce an alias either.
> > > > > > 
> > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > SoC PCI devices.
> > > > > > 
> > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > 
> > > > > Can you supply some text here about why we want to apply this patch?
> > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > performance, avoid a crash, etc?
> > > > 
> > > > If this is for the commit message, I hope the following is ok:
> > > > 
> > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > devices, and the IOMMU groups are setup correctly."
> > > 
> > > This doesn't get at what the actual problem is.  I'm hoping for
> > > something like "without this change, we set up an IOMMU mapping for
> > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > results in an IOMMU fault"
> > 
> > Ok. I hope this would be better:
> > 
> > "Without this change, the last alias seen while traversing the PCI
> > hierarchy will be used as the RID to generate the device ID for ITS
> > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > device to fail since the ITS expects to have translation tables based
> > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > device DMA also fails when SMMU is enabled due to incorrect value in
> > SMMU translation tables"
> 
> This description is true, but I don't think it addresses the real
> problem.  I think the real problem is that your IOMMU code doesn't
> handle aliases correctly, and by ignoring these invalid aliases, we
> happen to map an alias that works for the builtin devices.  But that's
> only because we got lucky (those devices use a single RID and they're
> not behind bridges that optionally take ownership).
> 
> It would make sense to me if we fixed the IOMMU code to map *all* the
> aliases, which should be enough to make your devices work.  If we then
> wanted to apply a patch like this on top, it would be simply an
> optimization that avoids unnecessary IOMMU mappings.

The issue that the IOMMU code does not handle valid aliases is
unrelated to what I am trying to fix. The quirk is to make sure
that invalid aliases are not seen on ThunderX2 while doing
pci_for_each_dma_alias().

The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
where the SMMU (or ITS) is attached, i.e. at the bridge marked with
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
for aliases above that point.

The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
the on-chip devices are directly connected to the ITS (they do not
use SMMU).

The top two levels of bridges are not real PCI bridges but just
PCI bridge-like things that were added to tie the whole hierarchy
together for configuration and enumeration. They do not handle
PCI/PCIe transactions in the traditional sense.

I think my problem description is still not correct, maybe:
"The SMMU (and ITS) expects to device tables to use the RID seen
at the bridge they are associated with. Currently the
pci_for_each_dma_alias() code traverses beyond this point and
generates incorrect aliases due to the PCI and PCI/PCIe bridges
above. This causes MSI-X interrupts and device DMA to fail since
the SMMU and ITS tables to be setup with incorrect IDs"

> > > I suspect the reason this patch makes a difference is because the
> > > current pci_for_each_dma_alias() believes one of those top-level
> > > bridges is an alias, and the iterator produces it last, so that's the
> > > one you map.  The IOMMU is attached lower down, so that top-level
> > > bridge is not in fact an alias, but since you only look at the *last*
> > > one, you don't map the correct aliases from lower down in the tree.
> > 
> > Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> > means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> > 
> > In the case of Cavium ThunderX2, the RID which we should see on the RC
> > - if we follow the standard and factor in the aliasing introduced by the
> > PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> > ITS).
> > 
> > But, if we stop the traversal at the point where SMMU (or ITS) is
> > attached, we will get the correct RID as seen by these.
> 
> There is a single "correct RID" for your builtin SATA and USB, but in
> general there is no single RID.

In case of external PCIe devices too, the RID (or aliases) seen to the
point where SMMU or ITS is connected (i.e, the bridge marked with flag
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT) are correct.
 
> > > Stopping the iterator earlier happens to make the last alias be one of
> > > the correct ones, but it doesn't solve the problems of quirked devices
> > > that can use multiple requester IDs, and it doesn't solve the problem
> > > of PCIe-to-PCI bridges that optionally take ownership of transactions.
> >  
> > If these happen below the point where the SMMU is attached, we will
> > consider the last alias introduced, which should be ok. If they are
> > above, the alias introduced is not relevant.  Devices with multiple
> > aliases is not handled anywhere in ARM code, so I don't think we should
> > consider that here.
> 
> I think we *should* consider it here.  The multiple alias situation is
> generic PCIe, independent of ARM.  If you want to support arbitrary
> PCIe plugin devices, you have to handle it.  I think any device with a
> quirk that calls pci_add_dma_alias() will currently fail on your
> system.  And I think devices behind a bridge that optionally takes
> ownership of DMA transactions will also fail.
 
The issue that IOMMU code does not handle all aliases has to be addressed
separately (and Robin seems to be looking at this from his response).

And even when that is addressed, this quirk is still needed on ThunderX2,
as I pointed out above.

Hope I am still not missing the point, and thanks for your patience here.
JC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 18:10               ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 18:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > [+cc Joerg]
> > > 
> > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > Hi Jayachandran,
> > > > > 
> > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > 
> > > > > > [node level PCI bridges - one per node]
> > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > >             [External PCI devices connected to PCIe links]
> > > > > > 
> > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > node level bridges do not introduce an alias either.
> > > > > > 
> > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > SoC PCI devices.
> > > > > > 
> > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > 
> > > > > Can you supply some text here about why we want to apply this patch?
> > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > performance, avoid a crash, etc?
> > > > 
> > > > If this is for the commit message, I hope the following is ok:
> > > > 
> > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > devices, and the IOMMU groups are setup correctly."
> > > 
> > > This doesn't get at what the actual problem is.  I'm hoping for
> > > something like "without this change, we set up an IOMMU mapping for
> > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > results in an IOMMU fault"
> > 
> > Ok. I hope this would be better:
> > 
> > "Without this change, the last alias seen while traversing the PCI
> > hierarchy will be used as the RID to generate the device ID for ITS
> > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > device to fail since the ITS expects to have translation tables based
> > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > device DMA also fails when SMMU is enabled due to incorrect value in
> > SMMU translation tables"
> 
> This description is true, but I don't think it addresses the real
> problem.  I think the real problem is that your IOMMU code doesn't
> handle aliases correctly, and by ignoring these invalid aliases, we
> happen to map an alias that works for the builtin devices.  But that's
> only because we got lucky (those devices use a single RID and they're
> not behind bridges that optionally take ownership).
> 
> It would make sense to me if we fixed the IOMMU code to map *all* the
> aliases, which should be enough to make your devices work.  If we then
> wanted to apply a patch like this on top, it would be simply an
> optimization that avoids unnecessary IOMMU mappings.

The issue that the IOMMU code does not handle valid aliases is
unrelated to what I am trying to fix. The quirk is to make sure
that invalid aliases are not seen on ThunderX2 while doing
pci_for_each_dma_alias().

The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
where the SMMU (or ITS) is attached, i.e. at the bridge marked with
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
for aliases above that point.

The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
the on-chip devices are directly connected to the ITS (they do not
use SMMU).

The top two levels of bridges are not real PCI bridges but just
PCI bridge-like things that were added to tie the whole hierarchy
together for configuration and enumeration. They do not handle
PCI/PCIe transactions in the traditional sense.

I think my problem description is still not correct, maybe:
"The SMMU (and ITS) expects to device tables to use the RID seen
at the bridge they are associated with. Currently the
pci_for_each_dma_alias() code traverses beyond this point and
generates incorrect aliases due to the PCI and PCI/PCIe bridges
above. This causes MSI-X interrupts and device DMA to fail since
the SMMU and ITS tables to be setup with incorrect IDs"

> > > I suspect the reason this patch makes a difference is because the
> > > current pci_for_each_dma_alias() believes one of those top-level
> > > bridges is an alias, and the iterator produces it last, so that's the
> > > one you map.  The IOMMU is attached lower down, so that top-level
> > > bridge is not in fact an alias, but since you only look at the *last*
> > > one, you don't map the correct aliases from lower down in the tree.
> > 
> > Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> > means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> > 
> > In the case of Cavium ThunderX2, the RID which we should see on the RC
> > - if we follow the standard and factor in the aliasing introduced by the
> > PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> > ITS).
> > 
> > But, if we stop the traversal at the point where SMMU (or ITS) is
> > attached, we will get the correct RID as seen by these.
> 
> There is a single "correct RID" for your builtin SATA and USB, but in
> general there is no single RID.

In case of external PCIe devices too, the RID (or aliases) seen to the
point where SMMU or ITS is connected (i.e, the bridge marked with flag
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT) are correct.
 
> > > Stopping the iterator earlier happens to make the last alias be one of
> > > the correct ones, but it doesn't solve the problems of quirked devices
> > > that can use multiple requester IDs, and it doesn't solve the problem
> > > of PCIe-to-PCI bridges that optionally take ownership of transactions.
> >  
> > If these happen below the point where the SMMU is attached, we will
> > consider the last alias introduced, which should be ok. If they are
> > above, the alias introduced is not relevant.  Devices with multiple
> > aliases is not handled anywhere in ARM code, so I don't think we should
> > consider that here.
> 
> I think we *should* consider it here.  The multiple alias situation is
> generic PCIe, independent of ARM.  If you want to support arbitrary
> PCIe plugin devices, you have to handle it.  I think any device with a
> quirk that calls pci_add_dma_alias() will currently fail on your
> system.  And I think devices behind a bridge that optionally takes
> ownership of DMA transactions will also fail.
 
The issue that IOMMU code does not handle all aliases has to be addressed
separately (and Robin seems to be looking at this from his response).

And even when that is addressed, this quirk is still needed on ThunderX2,
as I pointed out above.

Hope I am still not missing the point, and thanks for your patience here.
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 18:10               ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 18:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > [+cc Joerg]
> > > 
> > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > Hi Jayachandran,
> > > > > 
> > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > 
> > > > > > [node level PCI bridges - one per node]
> > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > >             [External PCI devices connected to PCIe links]
> > > > > > 
> > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > node level bridges do not introduce an alias either.
> > > > > > 
> > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > SoC PCI devices.
> > > > > > 
> > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > 
> > > > > Can you supply some text here about why we want to apply this patch?
> > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > performance, avoid a crash, etc?
> > > > 
> > > > If this is for the commit message, I hope the following is ok:
> > > > 
> > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > devices, and the IOMMU groups are setup correctly."
> > > 
> > > This doesn't get at what the actual problem is.  I'm hoping for
> > > something like "without this change, we set up an IOMMU mapping for
> > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > results in an IOMMU fault"
> > 
> > Ok. I hope this would be better:
> > 
> > "Without this change, the last alias seen while traversing the PCI
> > hierarchy will be used as the RID to generate the device ID for ITS
> > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > device to fail since the ITS expects to have translation tables based
> > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > device DMA also fails when SMMU is enabled due to incorrect value in
> > SMMU translation tables"
> 
> This description is true, but I don't think it addresses the real
> problem.  I think the real problem is that your IOMMU code doesn't
> handle aliases correctly, and by ignoring these invalid aliases, we
> happen to map an alias that works for the builtin devices.  But that's
> only because we got lucky (those devices use a single RID and they're
> not behind bridges that optionally take ownership).
> 
> It would make sense to me if we fixed the IOMMU code to map *all* the
> aliases, which should be enough to make your devices work.  If we then
> wanted to apply a patch like this on top, it would be simply an
> optimization that avoids unnecessary IOMMU mappings.

The issue that the IOMMU code does not handle valid aliases is
unrelated to what I am trying to fix. The quirk is to make sure
that invalid aliases are not seen on ThunderX2 while doing
pci_for_each_dma_alias().

The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
where the SMMU (or ITS) is attached, i.e. at the bridge marked with
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
for aliases above that point.

The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
the on-chip devices are directly connected to the ITS (they do not
use SMMU).

The top two levels of bridges are not real PCI bridges but just
PCI bridge-like things that were added to tie the whole hierarchy
together for configuration and enumeration. They do not handle
PCI/PCIe transactions in the traditional sense.

I think my problem description is still not correct, maybe:
"The SMMU (and ITS) expects to device tables to use the RID seen
at the bridge they are associated with. Currently the
pci_for_each_dma_alias() code traverses beyond this point and
generates incorrect aliases due to the PCI and PCI/PCIe bridges
above. This causes MSI-X interrupts and device DMA to fail since
the SMMU and ITS tables to be setup with incorrect IDs"

> > > I suspect the reason this patch makes a difference is because the
> > > current pci_for_each_dma_alias() believes one of those top-level
> > > bridges is an alias, and the iterator produces it last, so that's the
> > > one you map.  The IOMMU is attached lower down, so that top-level
> > > bridge is not in fact an alias, but since you only look at the *last*
> > > one, you don't map the correct aliases from lower down in the tree.
> > 
> > Exactly. The IORT spec allows a range of RIDs to map to an SMMU, which
> > means that a PCI RC can multiple SMMUs, each handling a subset of RIDs.
> > 
> > In the case of Cavium ThunderX2, the RID which we should see on the RC
> > - if we follow the standard and factor in the aliasing introduced by the
> > PCI bridge and the PCI/PCIe bridge - is not the RID seen by the SMMU (or
> > ITS).
> > 
> > But, if we stop the traversal at the point where SMMU (or ITS) is
> > attached, we will get the correct RID as seen by these.
> 
> There is a single "correct RID" for your builtin SATA and USB, but in
> general there is no single RID.

In case of external PCIe devices too, the RID (or aliases) seen to the
point where SMMU or ITS is connected (i.e, the bridge marked with flag
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT) are correct.
 
> > > Stopping the iterator earlier happens to make the last alias be one of
> > > the correct ones, but it doesn't solve the problems of quirked devices
> > > that can use multiple requester IDs, and it doesn't solve the problem
> > > of PCIe-to-PCI bridges that optionally take ownership of transactions.
> >  
> > If these happen below the point where the SMMU is attached, we will
> > consider the last alias introduced, which should be ok. If they are
> > above, the alias introduced is not relevant.  Devices with multiple
> > aliases is not handled anywhere in ARM code, so I don't think we should
> > consider that here.
> 
> I think we *should* consider it here.  The multiple alias situation is
> generic PCIe, independent of ARM.  If you want to support arbitrary
> PCIe plugin devices, you have to handle it.  I think any device with a
> quirk that calls pci_add_dma_alias() will currently fail on your
> system.  And I think devices behind a bridge that optionally takes
> ownership of DMA transactions will also fail.
 
The issue that IOMMU code does not handle all aliases has to be addressed
separately (and Robin seems to be looking at this from his response).

And even when that is addressed, this quirk is still needed on ThunderX2,
as I pointed out above.

Hope I am still not missing the point, and thanks for your patience here.
JC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-12 18:10               ` Jayachandran C
  (?)
@ 2017-04-12 19:11                 ` Bjorn Helgaas
  -1 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 19:11 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, iommu, Alex Williamson, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > [+cc Joerg]
> > > > 
> > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > Hi Jayachandran,
> > > > > > 
> > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > 
> > > > > > > [node level PCI bridges - one per node]
> > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > 
> > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > node level bridges do not introduce an alias either.
> > > > > > > 
> > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > SoC PCI devices.
> > > > > > > 
> > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > 
> > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > performance, avoid a crash, etc?
> > > > > 
> > > > > If this is for the commit message, I hope the following is ok:
> > > > > 
> > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > devices, and the IOMMU groups are setup correctly."
> > > > 
> > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > something like "without this change, we set up an IOMMU mapping for
> > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > results in an IOMMU fault"
> > > 
> > > Ok. I hope this would be better:
> > > 
> > > "Without this change, the last alias seen while traversing the PCI
> > > hierarchy will be used as the RID to generate the device ID for ITS
> > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > device to fail since the ITS expects to have translation tables based
> > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > SMMU translation tables"
> > 
> > This description is true, but I don't think it addresses the real
> > problem.  I think the real problem is that your IOMMU code doesn't
> > handle aliases correctly, and by ignoring these invalid aliases, we
> > happen to map an alias that works for the builtin devices.  But that's
> > only because we got lucky (those devices use a single RID and they're
> > not behind bridges that optionally take ownership).
> > 
> > It would make sense to me if we fixed the IOMMU code to map *all* the
> > aliases, which should be enough to make your devices work.  If we then
> > wanted to apply a patch like this on top, it would be simply an
> > optimization that avoids unnecessary IOMMU mappings.
> 
> The issue that the IOMMU code does not handle valid aliases is
> unrelated to what I am trying to fix. The quirk is to make sure
> that invalid aliases are not seen on ThunderX2 while doing
> pci_for_each_dma_alias().
> 
> The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> for aliases above that point.
> 
> The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> the on-chip devices are directly connected to the ITS (they do not
> use SMMU).
> 
> The top two levels of bridges are not real PCI bridges but just
> PCI bridge-like things that were added to tie the whole hierarchy
> together for configuration and enumeration. They do not handle
> PCI/PCIe transactions in the traditional sense.
> 
> I think my problem description is still not correct, maybe:
> "The SMMU (and ITS) expects to device tables to use the RID seen
> at the bridge they are associated with. Currently the
> pci_for_each_dma_alias() code traverses beyond this point and
> generates incorrect aliases due to the PCI and PCI/PCIe bridges
> above. This causes MSI-X interrupts and device DMA to fail since
> the SMMU and ITS tables to be setup with incorrect IDs"

I haven't tried to figure out the MSI-X piece of this, but let me try
to come up with a concrete DMA example.  Assume this topology:

  00:00.0 bridge to [bus 01-1e]
  01:0a.0 bridge to [bus 04-05]
  04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
  05:00.0 endpoint

Assume 05:00.0 generates a DMA request.  Assume the top two bridges
are such that pci_for_each_dma_alias() includes them as well, so it
iterates through 05:00.0, 01:0a.0, and 00:00.0.

When the driver for 05:00.0 makes a DMA mapping, the current code
apparently makes an IOMMU mapping for requester ID 00:00.0 because
that's the last alias.  Obviously this doesn't work because the IOMMU
at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.

With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
it would *also* work fine if we made IOMMU mappings for 05:00.0,
01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
harmful.

Now assume 05:00 is a multi-function device that has a DMA alias
quirk, e.g., see quirk_dma_func0_alias().  It has another function:

  05:00.3 endpoint

DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
00:00.0, which again doesn't work.

With this quirk, we create a single mapping for 05:00.0.  That will
work sometimes, but the device may also generate DMA with a requester
ID of 05:00.3, and that won't work.

If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
mapping for 00:00.0, which doesn't work.  With this quirk, we'll
ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
think if you made the IOMMU code add mappings for each of the aliases,
i.e., for both 01:04.0 and 00:00.0, that device *would* work even
without this quirk.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 19:11                 ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 19:11 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > [+cc Joerg]
> > > > 
> > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > Hi Jayachandran,
> > > > > > 
> > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > 
> > > > > > > [node level PCI bridges - one per node]
> > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > 
> > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > node level bridges do not introduce an alias either.
> > > > > > > 
> > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > SoC PCI devices.
> > > > > > > 
> > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > 
> > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > performance, avoid a crash, etc?
> > > > > 
> > > > > If this is for the commit message, I hope the following is ok:
> > > > > 
> > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > devices, and the IOMMU groups are setup correctly."
> > > > 
> > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > something like "without this change, we set up an IOMMU mapping for
> > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > results in an IOMMU fault"
> > > 
> > > Ok. I hope this would be better:
> > > 
> > > "Without this change, the last alias seen while traversing the PCI
> > > hierarchy will be used as the RID to generate the device ID for ITS
> > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > device to fail since the ITS expects to have translation tables based
> > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > SMMU translation tables"
> > 
> > This description is true, but I don't think it addresses the real
> > problem.  I think the real problem is that your IOMMU code doesn't
> > handle aliases correctly, and by ignoring these invalid aliases, we
> > happen to map an alias that works for the builtin devices.  But that's
> > only because we got lucky (those devices use a single RID and they're
> > not behind bridges that optionally take ownership).
> > 
> > It would make sense to me if we fixed the IOMMU code to map *all* the
> > aliases, which should be enough to make your devices work.  If we then
> > wanted to apply a patch like this on top, it would be simply an
> > optimization that avoids unnecessary IOMMU mappings.
> 
> The issue that the IOMMU code does not handle valid aliases is
> unrelated to what I am trying to fix. The quirk is to make sure
> that invalid aliases are not seen on ThunderX2 while doing
> pci_for_each_dma_alias().
> 
> The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> for aliases above that point.
> 
> The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> the on-chip devices are directly connected to the ITS (they do not
> use SMMU).
> 
> The top two levels of bridges are not real PCI bridges but just
> PCI bridge-like things that were added to tie the whole hierarchy
> together for configuration and enumeration. They do not handle
> PCI/PCIe transactions in the traditional sense.
> 
> I think my problem description is still not correct, maybe:
> "The SMMU (and ITS) expects to device tables to use the RID seen
> at the bridge they are associated with. Currently the
> pci_for_each_dma_alias() code traverses beyond this point and
> generates incorrect aliases due to the PCI and PCI/PCIe bridges
> above. This causes MSI-X interrupts and device DMA to fail since
> the SMMU and ITS tables to be setup with incorrect IDs"

I haven't tried to figure out the MSI-X piece of this, but let me try
to come up with a concrete DMA example.  Assume this topology:

  00:00.0 bridge to [bus 01-1e]
  01:0a.0 bridge to [bus 04-05]
  04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
  05:00.0 endpoint

Assume 05:00.0 generates a DMA request.  Assume the top two bridges
are such that pci_for_each_dma_alias() includes them as well, so it
iterates through 05:00.0, 01:0a.0, and 00:00.0.

When the driver for 05:00.0 makes a DMA mapping, the current code
apparently makes an IOMMU mapping for requester ID 00:00.0 because
that's the last alias.  Obviously this doesn't work because the IOMMU
at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.

With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
it would *also* work fine if we made IOMMU mappings for 05:00.0,
01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
harmful.

Now assume 05:00 is a multi-function device that has a DMA alias
quirk, e.g., see quirk_dma_func0_alias().  It has another function:

  05:00.3 endpoint

DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
00:00.0, which again doesn't work.

With this quirk, we create a single mapping for 05:00.0.  That will
work sometimes, but the device may also generate DMA with a requester
ID of 05:00.3, and that won't work.

If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
mapping for 00:00.0, which doesn't work.  With this quirk, we'll
ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
think if you made the IOMMU code add mappings for each of the aliases,
i.e., for both 01:04.0 and 00:00.0, that device *would* work even
without this quirk.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 19:11                 ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 19:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > [+cc Joerg]
> > > > 
> > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > Hi Jayachandran,
> > > > > > 
> > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > 
> > > > > > > [node level PCI bridges - one per node]
> > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > 
> > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > node level bridges do not introduce an alias either.
> > > > > > > 
> > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > SoC PCI devices.
> > > > > > > 
> > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > 
> > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > performance, avoid a crash, etc?
> > > > > 
> > > > > If this is for the commit message, I hope the following is ok:
> > > > > 
> > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > devices, and the IOMMU groups are setup correctly."
> > > > 
> > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > something like "without this change, we set up an IOMMU mapping for
> > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > results in an IOMMU fault"
> > > 
> > > Ok. I hope this would be better:
> > > 
> > > "Without this change, the last alias seen while traversing the PCI
> > > hierarchy will be used as the RID to generate the device ID for ITS
> > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > device to fail since the ITS expects to have translation tables based
> > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > SMMU translation tables"
> > 
> > This description is true, but I don't think it addresses the real
> > problem.  I think the real problem is that your IOMMU code doesn't
> > handle aliases correctly, and by ignoring these invalid aliases, we
> > happen to map an alias that works for the builtin devices.  But that's
> > only because we got lucky (those devices use a single RID and they're
> > not behind bridges that optionally take ownership).
> > 
> > It would make sense to me if we fixed the IOMMU code to map *all* the
> > aliases, which should be enough to make your devices work.  If we then
> > wanted to apply a patch like this on top, it would be simply an
> > optimization that avoids unnecessary IOMMU mappings.
> 
> The issue that the IOMMU code does not handle valid aliases is
> unrelated to what I am trying to fix. The quirk is to make sure
> that invalid aliases are not seen on ThunderX2 while doing
> pci_for_each_dma_alias().
> 
> The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> for aliases above that point.
> 
> The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> the on-chip devices are directly connected to the ITS (they do not
> use SMMU).
> 
> The top two levels of bridges are not real PCI bridges but just
> PCI bridge-like things that were added to tie the whole hierarchy
> together for configuration and enumeration. They do not handle
> PCI/PCIe transactions in the traditional sense.
> 
> I think my problem description is still not correct, maybe:
> "The SMMU (and ITS) expects to device tables to use the RID seen
> at the bridge they are associated with. Currently the
> pci_for_each_dma_alias() code traverses beyond this point and
> generates incorrect aliases due to the PCI and PCI/PCIe bridges
> above. This causes MSI-X interrupts and device DMA to fail since
> the SMMU and ITS tables to be setup with incorrect IDs"

I haven't tried to figure out the MSI-X piece of this, but let me try
to come up with a concrete DMA example.  Assume this topology:

  00:00.0 bridge to [bus 01-1e]
  01:0a.0 bridge to [bus 04-05]
  04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
  05:00.0 endpoint

Assume 05:00.0 generates a DMA request.  Assume the top two bridges
are such that pci_for_each_dma_alias() includes them as well, so it
iterates through 05:00.0, 01:0a.0, and 00:00.0.

When the driver for 05:00.0 makes a DMA mapping, the current code
apparently makes an IOMMU mapping for requester ID 00:00.0 because
that's the last alias.  Obviously this doesn't work because the IOMMU
at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.

With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
it would *also* work fine if we made IOMMU mappings for 05:00.0,
01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
harmful.

Now assume 05:00 is a multi-function device that has a DMA alias
quirk, e.g., see quirk_dma_func0_alias().  It has another function:

  05:00.3 endpoint

DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
00:00.0, which again doesn't work.

With this quirk, we create a single mapping for 05:00.0.  That will
work sometimes, but the device may also generate DMA with a requester
ID of 05:00.3, and that won't work.

If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
mapping for 00:00.0, which doesn't work.  With this quirk, we'll
ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
think if you made the IOMMU code add mappings for each of the aliases,
i.e., for both 01:04.0 and 00:00.0, that device *would* work even
without this quirk.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 20:41                   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > [+cc Joerg]
> > > > > 
> > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > Hi Jayachandran,
> > > > > > > 
> > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > 
> > > > > > > > [node level PCI bridges - one per node]
> > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > 
> > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > 
> > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > SoC PCI devices.
> > > > > > > > 
> > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > 
> > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > performance, avoid a crash, etc?
> > > > > > 
> > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > 
> > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > 
> > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > results in an IOMMU fault"
> > > > 
> > > > Ok. I hope this would be better:
> > > > 
> > > > "Without this change, the last alias seen while traversing the PCI
> > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > device to fail since the ITS expects to have translation tables based
> > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > SMMU translation tables"
> > > 
> > > This description is true, but I don't think it addresses the real
> > > problem.  I think the real problem is that your IOMMU code doesn't
> > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > happen to map an alias that works for the builtin devices.  But that's
> > > only because we got lucky (those devices use a single RID and they're
> > > not behind bridges that optionally take ownership).
> > > 
> > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > aliases, which should be enough to make your devices work.  If we then
> > > wanted to apply a patch like this on top, it would be simply an
> > > optimization that avoids unnecessary IOMMU mappings.
> > 
> > The issue that the IOMMU code does not handle valid aliases is
> > unrelated to what I am trying to fix. The quirk is to make sure
> > that invalid aliases are not seen on ThunderX2 while doing
> > pci_for_each_dma_alias().
> > 
> > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > for aliases above that point.
> > 
> > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > the on-chip devices are directly connected to the ITS (they do not
> > use SMMU).
> > 
> > The top two levels of bridges are not real PCI bridges but just
> > PCI bridge-like things that were added to tie the whole hierarchy
> > together for configuration and enumeration. They do not handle
> > PCI/PCIe transactions in the traditional sense.
> > 
> > I think my problem description is still not correct, maybe:
> > "The SMMU (and ITS) expects to device tables to use the RID seen
> > at the bridge they are associated with. Currently the
> > pci_for_each_dma_alias() code traverses beyond this point and
> > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > above. This causes MSI-X interrupts and device DMA to fail since
> > the SMMU and ITS tables to be setup with incorrect IDs"
> 
> I haven't tried to figure out the MSI-X piece of this, but let me try
> to come up with a concrete DMA example.  Assume this topology:
> 
>   00:00.0 bridge to [bus 01-1e]
>   01:0a.0 bridge to [bus 04-05]
>   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
>   05:00.0 endpoint
> 
> Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> are such that pci_for_each_dma_alias() includes them as well, so it
> iterates through 05:00.0, 01:0a.0, and 00:00.0.
> 
> When the driver for 05:00.0 makes a DMA mapping, the current code
> apparently makes an IOMMU mapping for requester ID 00:00.0 because
> that's the last alias.  Obviously this doesn't work because the IOMMU
> at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> 
> With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> it would *also* work fine if we made IOMMU mappings for 05:00.0,
> 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> harmful.
 
Ok. The last two are not harmful but still incorrect. The bridges
0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
(they go from 4:0.0 to the SMMU).

> Now assume 05:00 is a multi-function device that has a DMA alias
> quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> 
>   05:00.3 endpoint
> 
> DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> 00:00.0, which again doesn't work.
> 
> With this quirk, we create a single mapping for 05:00.0.  That will
> work sometimes, but the device may also generate DMA with a requester
> ID of 05:00.3, and that won't work.

Note that here, pci_for_each_dma_alias() works correctly with the quirk
and incorrectly without the quirk. With the quirk, the callback function
is involved on 05:00.3 and 05:00.0 as expected. Without the quirk the
call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
includes aliases that are not valid.

The idea behind the quirk is to get pci_for_each_dma_alias work correctly
on ThunderX2, and invoke the function only on the valid aliases. With the
quirk, the code using it - like the SMMU code - gets the correct aliases
to process, and do not have to deal with or filter out incorrect aliases.

I don't think it would be safe to assume that all callers of
pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
the IOMMU case, the case which calculates stream IDs is one usage, the
usage for iommu groups also did not deal with invalid aliases correctly
when I tried it. Then there are the MSI cases too, I had done some work
on trying to filter out invalid aliases in the relevant code[1], but
in my opinion, getting pci_for_each_dma_alias() do the right thing is
the correct solution.

> If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
> probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
> mapping for 00:00.0, which doesn't work.  With this quirk, we'll
> ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
> think if you made the IOMMU code add mappings for each of the aliases,
> i.e., for both 01:04.0 and 00:00.0, that device *would* work even
> without this quirk.

There is no IOMMU for the on-chip devices, but the MSI-X interrupts are
looked up in a table based on DeviceID(RID) by the ARM interrupt
controller. The issue here is similar.

JC.

[1]https://www.spinics.net/lists/arm-kernel/msg573744.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 20:41                   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jon Masters,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > [+cc Joerg]
> > > > > 
> > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > Hi Jayachandran,
> > > > > > > 
> > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > 
> > > > > > > > [node level PCI bridges - one per node]
> > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > 
> > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > 
> > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > SoC PCI devices.
> > > > > > > > 
> > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > 
> > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > performance, avoid a crash, etc?
> > > > > > 
> > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > 
> > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > 
> > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > results in an IOMMU fault"
> > > > 
> > > > Ok. I hope this would be better:
> > > > 
> > > > "Without this change, the last alias seen while traversing the PCI
> > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > device to fail since the ITS expects to have translation tables based
> > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > SMMU translation tables"
> > > 
> > > This description is true, but I don't think it addresses the real
> > > problem.  I think the real problem is that your IOMMU code doesn't
> > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > happen to map an alias that works for the builtin devices.  But that's
> > > only because we got lucky (those devices use a single RID and they're
> > > not behind bridges that optionally take ownership).
> > > 
> > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > aliases, which should be enough to make your devices work.  If we then
> > > wanted to apply a patch like this on top, it would be simply an
> > > optimization that avoids unnecessary IOMMU mappings.
> > 
> > The issue that the IOMMU code does not handle valid aliases is
> > unrelated to what I am trying to fix. The quirk is to make sure
> > that invalid aliases are not seen on ThunderX2 while doing
> > pci_for_each_dma_alias().
> > 
> > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > for aliases above that point.
> > 
> > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > the on-chip devices are directly connected to the ITS (they do not
> > use SMMU).
> > 
> > The top two levels of bridges are not real PCI bridges but just
> > PCI bridge-like things that were added to tie the whole hierarchy
> > together for configuration and enumeration. They do not handle
> > PCI/PCIe transactions in the traditional sense.
> > 
> > I think my problem description is still not correct, maybe:
> > "The SMMU (and ITS) expects to device tables to use the RID seen
> > at the bridge they are associated with. Currently the
> > pci_for_each_dma_alias() code traverses beyond this point and
> > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > above. This causes MSI-X interrupts and device DMA to fail since
> > the SMMU and ITS tables to be setup with incorrect IDs"
> 
> I haven't tried to figure out the MSI-X piece of this, but let me try
> to come up with a concrete DMA example.  Assume this topology:
> 
>   00:00.0 bridge to [bus 01-1e]
>   01:0a.0 bridge to [bus 04-05]
>   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
>   05:00.0 endpoint
> 
> Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> are such that pci_for_each_dma_alias() includes them as well, so it
> iterates through 05:00.0, 01:0a.0, and 00:00.0.
> 
> When the driver for 05:00.0 makes a DMA mapping, the current code
> apparently makes an IOMMU mapping for requester ID 00:00.0 because
> that's the last alias.  Obviously this doesn't work because the IOMMU
> at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> 
> With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> it would *also* work fine if we made IOMMU mappings for 05:00.0,
> 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> harmful.
 
Ok. The last two are not harmful but still incorrect. The bridges
0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
(they go from 4:0.0 to the SMMU).

> Now assume 05:00 is a multi-function device that has a DMA alias
> quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> 
>   05:00.3 endpoint
> 
> DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> 00:00.0, which again doesn't work.
> 
> With this quirk, we create a single mapping for 05:00.0.  That will
> work sometimes, but the device may also generate DMA with a requester
> ID of 05:00.3, and that won't work.

Note that here, pci_for_each_dma_alias() works correctly with the quirk
and incorrectly without the quirk. With the quirk, the callback function
is involved on 05:00.3 and 05:00.0 as expected. Without the quirk the
call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
includes aliases that are not valid.

The idea behind the quirk is to get pci_for_each_dma_alias work correctly
on ThunderX2, and invoke the function only on the valid aliases. With the
quirk, the code using it - like the SMMU code - gets the correct aliases
to process, and do not have to deal with or filter out incorrect aliases.

I don't think it would be safe to assume that all callers of
pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
the IOMMU case, the case which calculates stream IDs is one usage, the
usage for iommu groups also did not deal with invalid aliases correctly
when I tried it. Then there are the MSI cases too, I had done some work
on trying to filter out invalid aliases in the relevant code[1], but
in my opinion, getting pci_for_each_dma_alias() do the right thing is
the correct solution.

> If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
> probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
> mapping for 00:00.0, which doesn't work.  With this quirk, we'll
> ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
> think if you made the IOMMU code add mappings for each of the aliases,
> i.e., for both 01:04.0 and 00:00.0, that device *would* work even
> without this quirk.

There is no IOMMU for the on-chip devices, but the MSI-X interrupts are
looked up in a table based on DeviceID(RID) by the ARM interrupt
controller. The issue here is similar.

JC.

[1]https://www.spinics.net/lists/arm-kernel/msg573744.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 20:41                   ` Jayachandran C
  0 siblings, 0 replies; 64+ messages in thread
From: Jayachandran C @ 2017-04-12 20:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > [+cc Joerg]
> > > > > 
> > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > Hi Jayachandran,
> > > > > > > 
> > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > 
> > > > > > > > [node level PCI bridges - one per node]
> > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > 
> > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > 
> > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > SoC PCI devices.
> > > > > > > > 
> > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > 
> > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > performance, avoid a crash, etc?
> > > > > > 
> > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > 
> > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > 
> > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > results in an IOMMU fault"
> > > > 
> > > > Ok. I hope this would be better:
> > > > 
> > > > "Without this change, the last alias seen while traversing the PCI
> > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > device to fail since the ITS expects to have translation tables based
> > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > SMMU translation tables"
> > > 
> > > This description is true, but I don't think it addresses the real
> > > problem.  I think the real problem is that your IOMMU code doesn't
> > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > happen to map an alias that works for the builtin devices.  But that's
> > > only because we got lucky (those devices use a single RID and they're
> > > not behind bridges that optionally take ownership).
> > > 
> > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > aliases, which should be enough to make your devices work.  If we then
> > > wanted to apply a patch like this on top, it would be simply an
> > > optimization that avoids unnecessary IOMMU mappings.
> > 
> > The issue that the IOMMU code does not handle valid aliases is
> > unrelated to what I am trying to fix. The quirk is to make sure
> > that invalid aliases are not seen on ThunderX2 while doing
> > pci_for_each_dma_alias().
> > 
> > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > for aliases above that point.
> > 
> > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > the on-chip devices are directly connected to the ITS (they do not
> > use SMMU).
> > 
> > The top two levels of bridges are not real PCI bridges but just
> > PCI bridge-like things that were added to tie the whole hierarchy
> > together for configuration and enumeration. They do not handle
> > PCI/PCIe transactions in the traditional sense.
> > 
> > I think my problem description is still not correct, maybe:
> > "The SMMU (and ITS) expects to device tables to use the RID seen
> > at the bridge they are associated with. Currently the
> > pci_for_each_dma_alias() code traverses beyond this point and
> > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > above. This causes MSI-X interrupts and device DMA to fail since
> > the SMMU and ITS tables to be setup with incorrect IDs"
> 
> I haven't tried to figure out the MSI-X piece of this, but let me try
> to come up with a concrete DMA example.  Assume this topology:
> 
>   00:00.0 bridge to [bus 01-1e]
>   01:0a.0 bridge to [bus 04-05]
>   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
>   05:00.0 endpoint
> 
> Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> are such that pci_for_each_dma_alias() includes them as well, so it
> iterates through 05:00.0, 01:0a.0, and 00:00.0.
> 
> When the driver for 05:00.0 makes a DMA mapping, the current code
> apparently makes an IOMMU mapping for requester ID 00:00.0 because
> that's the last alias.  Obviously this doesn't work because the IOMMU
> at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> 
> With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> it would *also* work fine if we made IOMMU mappings for 05:00.0,
> 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> harmful.
 
Ok. The last two are not harmful but still incorrect. The bridges
0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
(they go from 4:0.0 to the SMMU).

> Now assume 05:00 is a multi-function device that has a DMA alias
> quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> 
>   05:00.3 endpoint
> 
> DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> 00:00.0, which again doesn't work.
> 
> With this quirk, we create a single mapping for 05:00.0.  That will
> work sometimes, but the device may also generate DMA with a requester
> ID of 05:00.3, and that won't work.

Note that here, pci_for_each_dma_alias() works correctly with the quirk
and incorrectly without the quirk. With the quirk, the callback function
is involved on 05:00.3 and 05:00.0 as expected. Without the quirk the
call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
includes aliases that are not valid.

The idea behind the quirk is to get pci_for_each_dma_alias work correctly
on ThunderX2, and invoke the function only on the valid aliases. With the
quirk, the code using it - like the SMMU code - gets the correct aliases
to process, and do not have to deal with or filter out incorrect aliases.

I don't think it would be safe to assume that all callers of
pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
the IOMMU case, the case which calculates stream IDs is one usage, the
usage for iommu groups also did not deal with invalid aliases correctly
when I tried it. Then there are the MSI cases too, I had done some work
on trying to filter out invalid aliases in the relevant code[1], but
in my opinion, getting pci_for_each_dma_alias() do the right thing is
the correct solution.

> If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
> probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
> mapping for 00:00.0, which doesn't work.  With this quirk, we'll
> ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
> think if you made the IOMMU code add mappings for each of the aliases,
> i.e., for both 01:04.0 and 00:00.0, that device *would* work even
> without this quirk.

There is no IOMMU for the on-chip devices, but the MSI-X interrupts are
looked up in a table based on DeviceID(RID) by the ARM interrupt
controller. The issue here is similar.

JC.

[1]https://www.spinics.net/lists/arm-kernel/msg573744.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-12 20:41                   ` Jayachandran C
  (?)
@ 2017-04-12 23:18                     ` Bjorn Helgaas
  -1 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 23:18 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, iommu, Alex Williamson, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 08:41:20PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> > On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg]
> > > > > > 
> > > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > > Hi Jayachandran,
> > > > > > > > 
> > > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > > 
> > > > > > > > > [node level PCI bridges - one per node]
> > > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > > 
> > > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > > 
> > > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > > SoC PCI devices.
> > > > > > > > > 
> > > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > > 
> > > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > > performance, avoid a crash, etc?
> > > > > > > 
> > > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > > 
> > > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > > 
> > > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > > results in an IOMMU fault"
> > > > > 
> > > > > Ok. I hope this would be better:
> > > > > 
> > > > > "Without this change, the last alias seen while traversing the PCI
> > > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > > device to fail since the ITS expects to have translation tables based
> > > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > > SMMU translation tables"
> > > > 
> > > > This description is true, but I don't think it addresses the real
> > > > problem.  I think the real problem is that your IOMMU code doesn't
> > > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > > happen to map an alias that works for the builtin devices.  But that's
> > > > only because we got lucky (those devices use a single RID and they're
> > > > not behind bridges that optionally take ownership).
> > > > 
> > > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > > aliases, which should be enough to make your devices work.  If we then
> > > > wanted to apply a patch like this on top, it would be simply an
> > > > optimization that avoids unnecessary IOMMU mappings.
> > > 
> > > The issue that the IOMMU code does not handle valid aliases is
> > > unrelated to what I am trying to fix. The quirk is to make sure
> > > that invalid aliases are not seen on ThunderX2 while doing
> > > pci_for_each_dma_alias().
> > > 
> > > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > > for aliases above that point.
> > > 
> > > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > > the on-chip devices are directly connected to the ITS (they do not
> > > use SMMU).
> > > 
> > > The top two levels of bridges are not real PCI bridges but just
> > > PCI bridge-like things that were added to tie the whole hierarchy
> > > together for configuration and enumeration. They do not handle
> > > PCI/PCIe transactions in the traditional sense.
> > > 
> > > I think my problem description is still not correct, maybe:
> > > "The SMMU (and ITS) expects to device tables to use the RID seen
> > > at the bridge they are associated with. Currently the
> > > pci_for_each_dma_alias() code traverses beyond this point and
> > > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > > above. This causes MSI-X interrupts and device DMA to fail since
> > > the SMMU and ITS tables to be setup with incorrect IDs"
> > 
> > I haven't tried to figure out the MSI-X piece of this, but let me try
> > to come up with a concrete DMA example.  Assume this topology:
> > 
> >   00:00.0 bridge to [bus 01-1e]
> >   01:0a.0 bridge to [bus 04-05]
> >   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
> >   05:00.0 endpoint
> > 
> > Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> > are such that pci_for_each_dma_alias() includes them as well, so it
> > iterates through 05:00.0, 01:0a.0, and 00:00.0.
> > 
> > When the driver for 05:00.0 makes a DMA mapping, the current code
> > apparently makes an IOMMU mapping for requester ID 00:00.0 because
> > that's the last alias.  Obviously this doesn't work because the IOMMU
> > at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> > 
> > With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> > IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> > it would *also* work fine if we made IOMMU mappings for 05:00.0,
> > 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> > harmful.
>  
> Ok. The last two are not harmful but still incorrect. The bridges
> 0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
> (they go from 4:0.0 to the SMMU).
> 
> > Now assume 05:00 is a multi-function device that has a DMA alias
> > quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> > 
> >   05:00.3 endpoint
> > 
> > DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> > The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> > through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> > 00:00.0, which again doesn't work.
> > 
> > With this quirk, we create a single mapping for 05:00.0.  That will
> > work sometimes, but the device may also generate DMA with a requester
> > ID of 05:00.3, and that won't work.
> 
> Note that here, pci_for_each_dma_alias() works correctly with the quirk
> and incorrectly without the quirk. With the quirk, the callback function
> is involved on 05:00.3 and 05:00.0 as expected.

With the quirk, pci_for_each_dma_alias() works correctly and iterates
through 05:00.3 and 05:00.0.  But the IOMMU code only pay attention to
the *last* alias, i.e., 05:00.0.  So this does not work.

> Without the quirk the
> call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
> includes aliases that are not valid.
> 
> The idea behind the quirk is to get pci_for_each_dma_alias work correctly
> on ThunderX2, and invoke the function only on the valid aliases. With the
> quirk, the code using it - like the SMMU code - gets the correct aliases
> to process, and do not have to deal with or filter out incorrect aliases.
> 
> I don't think it would be safe to assume that all callers of
> pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
> the IOMMU case, the case which calculates stream IDs is one usage, the
> usage for iommu groups also did not deal with invalid aliases correctly
> when I tried it. Then there are the MSI cases too, I had done some work
> on trying to filter out invalid aliases in the relevant code[1], but
> in my opinion, getting pci_for_each_dma_alias() do the right thing is
> the correct solution.

I agree, we should fix this and it sounds like it's more than just an
optimization.  But I don't think the quirk is a complete fix, and the
changelog needs a short sketch of this discussion to make it clear
that this happens to fix some DMA faults, but others remain because
the current IOMMU code only maps one of the valid aliases.

So I think the only thing we need to move forward on this is a revised
changelog.  I propose something like this:

  On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the
  PCI topology is slightly unusual.  For a multi-node system, it looks
  like:

    00:00.0 [?? type] bridge to [bus 01-1e]
    01:0a.0 [?? type] bridge to [bus 04-05]
    04:00.0 [?? type] bridge to [bus 05] (ThunderX2 XLATE_ROOT)
    05:00.0 endpoint

  pci_for_each_dma_alias() assumes IOMMU translation is done at the
  root of the PCI hierarchy.  It generates 05:00.0 and <BB:DD.F> as
  DMA aliases for 05:00.0 because bus <BB> is a non-PCIe bus that
  doesn't carry the Requester ID.

  Because the ThunderX2 IOMMU is at 04:00.0, <BB:DD.F> is never a valid
  Requester ID.  This quirk stops alias generation at the XLATE_ROOT
  bridge so we won't generate <BB:DD.F>.

  The current IOMMU code only maps the last alias (this is a separate
  bug in itself).  Prior to this quirk, we only created IOMMU mappings
  for the invalid Requester ID <BB:DD.F>, which never matched any DMA
  transactions.
  
  With this quirk, we create IOMMU mappings for a valid Requester ID,
  which fixes devices with no aliases but leaves devices with aliases
  still broken.

I don't know the details of what type of bridges those are
(conventional PCI?  PCIe PCI-to-PCIe? etc?) and exactly what RIDs are
involved.  It'd be nice if you could fill those in.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 23:18                     ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 23:18 UTC (permalink / raw)
  To: Jayachandran C
  Cc: linux-pci, Joerg Roedel, Alex Williamson, iommu, Jon Masters,
	Robin Murphy, linux-arm-kernel

On Wed, Apr 12, 2017 at 08:41:20PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> > On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg]
> > > > > > 
> > > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > > Hi Jayachandran,
> > > > > > > > 
> > > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > > 
> > > > > > > > > [node level PCI bridges - one per node]
> > > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > > 
> > > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > > 
> > > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > > SoC PCI devices.
> > > > > > > > > 
> > > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > > 
> > > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > > performance, avoid a crash, etc?
> > > > > > > 
> > > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > > 
> > > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > > 
> > > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > > results in an IOMMU fault"
> > > > > 
> > > > > Ok. I hope this would be better:
> > > > > 
> > > > > "Without this change, the last alias seen while traversing the PCI
> > > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > > device to fail since the ITS expects to have translation tables based
> > > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > > SMMU translation tables"
> > > > 
> > > > This description is true, but I don't think it addresses the real
> > > > problem.  I think the real problem is that your IOMMU code doesn't
> > > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > > happen to map an alias that works for the builtin devices.  But that's
> > > > only because we got lucky (those devices use a single RID and they're
> > > > not behind bridges that optionally take ownership).
> > > > 
> > > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > > aliases, which should be enough to make your devices work.  If we then
> > > > wanted to apply a patch like this on top, it would be simply an
> > > > optimization that avoids unnecessary IOMMU mappings.
> > > 
> > > The issue that the IOMMU code does not handle valid aliases is
> > > unrelated to what I am trying to fix. The quirk is to make sure
> > > that invalid aliases are not seen on ThunderX2 while doing
> > > pci_for_each_dma_alias().
> > > 
> > > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > > for aliases above that point.
> > > 
> > > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > > the on-chip devices are directly connected to the ITS (they do not
> > > use SMMU).
> > > 
> > > The top two levels of bridges are not real PCI bridges but just
> > > PCI bridge-like things that were added to tie the whole hierarchy
> > > together for configuration and enumeration. They do not handle
> > > PCI/PCIe transactions in the traditional sense.
> > > 
> > > I think my problem description is still not correct, maybe:
> > > "The SMMU (and ITS) expects to device tables to use the RID seen
> > > at the bridge they are associated with. Currently the
> > > pci_for_each_dma_alias() code traverses beyond this point and
> > > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > > above. This causes MSI-X interrupts and device DMA to fail since
> > > the SMMU and ITS tables to be setup with incorrect IDs"
> > 
> > I haven't tried to figure out the MSI-X piece of this, but let me try
> > to come up with a concrete DMA example.  Assume this topology:
> > 
> >   00:00.0 bridge to [bus 01-1e]
> >   01:0a.0 bridge to [bus 04-05]
> >   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
> >   05:00.0 endpoint
> > 
> > Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> > are such that pci_for_each_dma_alias() includes them as well, so it
> > iterates through 05:00.0, 01:0a.0, and 00:00.0.
> > 
> > When the driver for 05:00.0 makes a DMA mapping, the current code
> > apparently makes an IOMMU mapping for requester ID 00:00.0 because
> > that's the last alias.  Obviously this doesn't work because the IOMMU
> > at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> > 
> > With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> > IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> > it would *also* work fine if we made IOMMU mappings for 05:00.0,
> > 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> > harmful.
>  
> Ok. The last two are not harmful but still incorrect. The bridges
> 0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
> (they go from 4:0.0 to the SMMU).
> 
> > Now assume 05:00 is a multi-function device that has a DMA alias
> > quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> > 
> >   05:00.3 endpoint
> > 
> > DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> > The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> > through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> > 00:00.0, which again doesn't work.
> > 
> > With this quirk, we create a single mapping for 05:00.0.  That will
> > work sometimes, but the device may also generate DMA with a requester
> > ID of 05:00.3, and that won't work.
> 
> Note that here, pci_for_each_dma_alias() works correctly with the quirk
> and incorrectly without the quirk. With the quirk, the callback function
> is involved on 05:00.3 and 05:00.0 as expected.

With the quirk, pci_for_each_dma_alias() works correctly and iterates
through 05:00.3 and 05:00.0.  But the IOMMU code only pay attention to
the *last* alias, i.e., 05:00.0.  So this does not work.

> Without the quirk the
> call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
> includes aliases that are not valid.
> 
> The idea behind the quirk is to get pci_for_each_dma_alias work correctly
> on ThunderX2, and invoke the function only on the valid aliases. With the
> quirk, the code using it - like the SMMU code - gets the correct aliases
> to process, and do not have to deal with or filter out incorrect aliases.
> 
> I don't think it would be safe to assume that all callers of
> pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
> the IOMMU case, the case which calculates stream IDs is one usage, the
> usage for iommu groups also did not deal with invalid aliases correctly
> when I tried it. Then there are the MSI cases too, I had done some work
> on trying to filter out invalid aliases in the relevant code[1], but
> in my opinion, getting pci_for_each_dma_alias() do the right thing is
> the correct solution.

I agree, we should fix this and it sounds like it's more than just an
optimization.  But I don't think the quirk is a complete fix, and the
changelog needs a short sketch of this discussion to make it clear
that this happens to fix some DMA faults, but others remain because
the current IOMMU code only maps one of the valid aliases.

So I think the only thing we need to move forward on this is a revised
changelog.  I propose something like this:

  On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the
  PCI topology is slightly unusual.  For a multi-node system, it looks
  like:

    00:00.0 [?? type] bridge to [bus 01-1e]
    01:0a.0 [?? type] bridge to [bus 04-05]
    04:00.0 [?? type] bridge to [bus 05] (ThunderX2 XLATE_ROOT)
    05:00.0 endpoint

  pci_for_each_dma_alias() assumes IOMMU translation is done at the
  root of the PCI hierarchy.  It generates 05:00.0 and <BB:DD.F> as
  DMA aliases for 05:00.0 because bus <BB> is a non-PCIe bus that
  doesn't carry the Requester ID.

  Because the ThunderX2 IOMMU is at 04:00.0, <BB:DD.F> is never a valid
  Requester ID.  This quirk stops alias generation at the XLATE_ROOT
  bridge so we won't generate <BB:DD.F>.

  The current IOMMU code only maps the last alias (this is a separate
  bug in itself).  Prior to this quirk, we only created IOMMU mappings
  for the invalid Requester ID <BB:DD.F>, which never matched any DMA
  transactions.
  
  With this quirk, we create IOMMU mappings for a valid Requester ID,
  which fixes devices with no aliases but leaves devices with aliases
  still broken.

I don't know the details of what type of bridges those are
(conventional PCI?  PCIe PCI-to-PCIe? etc?) and exactly what RIDs are
involved.  It'd be nice if you could fill those in.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-12 23:18                     ` Bjorn Helgaas
  0 siblings, 0 replies; 64+ messages in thread
From: Bjorn Helgaas @ 2017-04-12 23:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 12, 2017 at 08:41:20PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> > On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> > > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg]
> > > > > > 
> > > > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > > Hi Jayachandran,
> > > > > > > > 
> > > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > > > 
> > > > > > > > > [node level PCI bridges - one per node]
> > > > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > > > 
> > > > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > > 
> > > > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > > > SoC PCI devices.
> > > > > > > > > 
> > > > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > > 
> > > > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > > performance, avoid a crash, etc?
> > > > > > > 
> > > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > > 
> > > > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > > 
> > > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > > something like "without this change, we set up an IOMMU mapping for
> > > > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > > > results in an IOMMU fault"
> > > > > 
> > > > > Ok. I hope this would be better:
> > > > > 
> > > > > "Without this change, the last alias seen while traversing the PCI
> > > > > hierarchy will be used as the RID to generate the device ID for ITS
> > > > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > > > device to fail since the ITS expects to have translation tables based
> > > > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > > > SMMU translation tables"
> > > > 
> > > > This description is true, but I don't think it addresses the real
> > > > problem.  I think the real problem is that your IOMMU code doesn't
> > > > handle aliases correctly, and by ignoring these invalid aliases, we
> > > > happen to map an alias that works for the builtin devices.  But that's
> > > > only because we got lucky (those devices use a single RID and they're
> > > > not behind bridges that optionally take ownership).
> > > > 
> > > > It would make sense to me if we fixed the IOMMU code to map *all* the
> > > > aliases, which should be enough to make your devices work.  If we then
> > > > wanted to apply a patch like this on top, it would be simply an
> > > > optimization that avoids unnecessary IOMMU mappings.
> > > 
> > > The issue that the IOMMU code does not handle valid aliases is
> > > unrelated to what I am trying to fix. The quirk is to make sure
> > > that invalid aliases are not seen on ThunderX2 while doing
> > > pci_for_each_dma_alias().
> > > 
> > > The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> > > where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> > > PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> > > for aliases above that point.
> > > 
> > > The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> > > the on-chip devices are directly connected to the ITS (they do not
> > > use SMMU).
> > > 
> > > The top two levels of bridges are not real PCI bridges but just
> > > PCI bridge-like things that were added to tie the whole hierarchy
> > > together for configuration and enumeration. They do not handle
> > > PCI/PCIe transactions in the traditional sense.
> > > 
> > > I think my problem description is still not correct, maybe:
> > > "The SMMU (and ITS) expects to device tables to use the RID seen
> > > at the bridge they are associated with. Currently the
> > > pci_for_each_dma_alias() code traverses beyond this point and
> > > generates incorrect aliases due to the PCI and PCI/PCIe bridges
> > > above. This causes MSI-X interrupts and device DMA to fail since
> > > the SMMU and ITS tables to be setup with incorrect IDs"
> > 
> > I haven't tried to figure out the MSI-X piece of this, but let me try
> > to come up with a concrete DMA example.  Assume this topology:
> > 
> >   00:00.0 bridge to [bus 01-1e]
> >   01:0a.0 bridge to [bus 04-05]
> >   04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
> >   05:00.0 endpoint
> > 
> > Assume 05:00.0 generates a DMA request.  Assume the top two bridges
> > are such that pci_for_each_dma_alias() includes them as well, so it
> > iterates through 05:00.0, 01:0a.0, and 00:00.0.
> > 
> > When the driver for 05:00.0 makes a DMA mapping, the current code
> > apparently makes an IOMMU mapping for requester ID 00:00.0 because
> > that's the last alias.  Obviously this doesn't work because the IOMMU
> > at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.
> > 
> > With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
> > IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
> > it would *also* work fine if we made IOMMU mappings for 05:00.0,
> > 01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
> > harmful.
>  
> Ok. The last two are not harmful but still incorrect. The bridges
> 0:0.0 and 1:a.0 are outside the path with the PCI transactions takes
> (they go from 4:0.0 to the SMMU).
> 
> > Now assume 05:00 is a multi-function device that has a DMA alias
> > quirk, e.g., see quirk_dma_func0_alias().  It has another function:
> > 
> >   05:00.3 endpoint
> > 
> > DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
> > The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
> > through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
> > 00:00.0, which again doesn't work.
> > 
> > With this quirk, we create a single mapping for 05:00.0.  That will
> > work sometimes, but the device may also generate DMA with a requester
> > ID of 05:00.3, and that won't work.
> 
> Note that here, pci_for_each_dma_alias() works correctly with the quirk
> and incorrectly without the quirk. With the quirk, the callback function
> is involved on 05:00.3 and 05:00.0 as expected.

With the quirk, pci_for_each_dma_alias() works correctly and iterates
through 05:00.3 and 05:00.0.  But the IOMMU code only pay attention to
the *last* alias, i.e., 05:00.0.  So this does not work.

> Without the quirk the
> call back is invoked on 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, which
> includes aliases that are not valid.
> 
> The idea behind the quirk is to get pci_for_each_dma_alias work correctly
> on ThunderX2, and invoke the function only on the valid aliases. With the
> quirk, the code using it - like the SMMU code - gets the correct aliases
> to process, and do not have to deal with or filter out incorrect aliases.
> 
> I don't think it would be safe to assume that all callers of
> pci_for_each_dma_alias will be ok to handle invalid aliases. Even in
> the IOMMU case, the case which calculates stream IDs is one usage, the
> usage for iommu groups also did not deal with invalid aliases correctly
> when I tried it. Then there are the MSI cases too, I had done some work
> on trying to filter out invalid aliases in the relevant code[1], but
> in my opinion, getting pci_for_each_dma_alias() do the right thing is
> the correct solution.

I agree, we should fix this and it sounds like it's more than just an
optimization.  But I don't think the quirk is a complete fix, and the
changelog needs a short sketch of this discussion to make it clear
that this happens to fix some DMA faults, but others remain because
the current IOMMU code only maps one of the valid aliases.

So I think the only thing we need to move forward on this is a revised
changelog.  I propose something like this:

  On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the
  PCI topology is slightly unusual.  For a multi-node system, it looks
  like:

    00:00.0 [?? type] bridge to [bus 01-1e]
    01:0a.0 [?? type] bridge to [bus 04-05]
    04:00.0 [?? type] bridge to [bus 05] (ThunderX2 XLATE_ROOT)
    05:00.0 endpoint

  pci_for_each_dma_alias() assumes IOMMU translation is done at the
  root of the PCI hierarchy.  It generates 05:00.0 and <BB:DD.F> as
  DMA aliases for 05:00.0 because bus <BB> is a non-PCIe bus that
  doesn't carry the Requester ID.

  Because the ThunderX2 IOMMU is at 04:00.0, <BB:DD.F> is never a valid
  Requester ID.  This quirk stops alias generation at the XLATE_ROOT
  bridge so we won't generate <BB:DD.F>.

  The current IOMMU code only maps the last alias (this is a separate
  bug in itself).  Prior to this quirk, we only created IOMMU mappings
  for the invalid Requester ID <BB:DD.F>, which never matched any DMA
  transactions.
  
  With this quirk, we create IOMMU mappings for a valid Requester ID,
  which fixes devices with no aliases but leaves devices with aliases
  still broken.

I don't know the details of what type of bridges those are
(conventional PCI?  PCIe PCI-to-PCIe? etc?) and exactly what RIDs are
involved.  It'd be nice if you could fill those in.

Bjorn

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
  2017-04-04 14:28         ` Robin Murphy
@ 2017-04-13  6:43           ` Jon Masters
  -1 siblings, 0 replies; 64+ messages in thread
From: Jon Masters @ 2017-04-13  6:43 UTC (permalink / raw)
  To: Robin Murphy, Jayachandran C
  Cc: Bjorn Helgaas, linux-pci, Alex Williamson, iommu, linux-arm-kernel

On 04/04/2017 10:28 AM, Robin Murphy wrote:

> So (at the risk of Jon mooing at me)

moooooooooooo


-- 
Computer Architect | Sent from my Fedora powered laptop

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
@ 2017-04-13  6:43           ` Jon Masters
  0 siblings, 0 replies; 64+ messages in thread
From: Jon Masters @ 2017-04-13  6:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/04/2017 10:28 AM, Robin Murphy wrote:

> So (at the risk of Jon mooing at me)

moooooooooooo


-- 
Computer Architect | Sent from my Fedora powered laptop

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2017-04-13  6:43 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 13:15 [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk Jayachandran C
2017-04-03 13:15 ` Jayachandran C
2017-04-03 13:15 ` Jayachandran C
2017-04-03 13:15 ` [PATCH v4 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT Jayachandran C
2017-04-03 13:15   ` Jayachandran C
2017-04-03 13:15   ` Jayachandran C
2017-04-03 14:59   ` Robin Murphy
2017-04-03 14:59     ` Robin Murphy
2017-04-03 14:59     ` Robin Murphy
2017-04-03 13:15 ` [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling Jayachandran C
2017-04-03 13:15   ` Jayachandran C
2017-04-03 13:15   ` Jayachandran C
2017-04-03 15:07   ` Robin Murphy
2017-04-03 15:07     ` Robin Murphy
2017-04-03 15:07     ` Robin Murphy
2017-04-04 11:50     ` Jayachandran C
2017-04-04 11:50       ` Jayachandran C
2017-04-04 11:50       ` Jayachandran C
2017-04-04 14:28       ` Robin Murphy
2017-04-04 14:28         ` Robin Murphy
2017-04-04 14:28         ` Robin Murphy
2017-04-10 11:38         ` Jayachandran C
2017-04-10 11:38           ` Jayachandran C
2017-04-10 11:38           ` Jayachandran C
2017-04-13  6:43         ` Jon Masters
2017-04-13  6:43           ` Jon Masters
2017-04-11  1:28   ` Bjorn Helgaas
2017-04-11  1:28     ` Bjorn Helgaas
2017-04-11  7:10     ` Jayachandran C
2017-04-11  7:10       ` Jayachandran C
2017-04-11  7:10       ` Jayachandran C
2017-04-11 13:41       ` Bjorn Helgaas
2017-04-11 13:41         ` Bjorn Helgaas
2017-04-11 13:41         ` Bjorn Helgaas
2017-04-11 15:27         ` Jayachandran C
2017-04-11 15:27           ` Jayachandran C
2017-04-11 15:27           ` Jayachandran C
2017-04-11 15:43           ` Jon Masters
2017-04-11 15:43             ` Jon Masters
2017-04-12 16:21           ` Bjorn Helgaas
2017-04-12 16:21             ` Bjorn Helgaas
2017-04-12 16:21             ` Bjorn Helgaas
2017-04-12 18:10             ` Jayachandran C
2017-04-12 18:10               ` Jayachandran C
2017-04-12 18:10               ` Jayachandran C
2017-04-12 19:11               ` Bjorn Helgaas
2017-04-12 19:11                 ` Bjorn Helgaas
2017-04-12 19:11                 ` Bjorn Helgaas
2017-04-12 20:41                 ` Jayachandran C
2017-04-12 20:41                   ` Jayachandran C
2017-04-12 20:41                   ` Jayachandran C
2017-04-12 23:18                   ` Bjorn Helgaas
2017-04-12 23:18                     ` Bjorn Helgaas
2017-04-12 23:18                     ` Bjorn Helgaas
2017-04-11 15:34         ` Robin Murphy
2017-04-11 15:34           ` Robin Murphy
2017-04-11 15:34           ` Robin Murphy
2017-04-11 13:44 ` [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk Bjorn Helgaas
2017-04-11 13:44   ` Bjorn Helgaas
2017-04-11 13:44   ` Bjorn Helgaas
     [not found]   ` <CABhMZUXNhKSQALAHP1CBNfWMuw0J0XQ2rzusP4WR_HHH9ox5Yw@mail.gmail.com>
     [not found]     ` <CABhMZUXh=X5k1DQhUcaXD4t9GWfXms80xWV7sAh0ZXD8YK794g@mail.gmail.com>
2017-04-11 14:23       ` Bjorn Helgaas
2017-04-11 14:23         ` Bjorn Helgaas
2017-04-11 16:01   ` David Daney
2017-04-11 16:01     ` David Daney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.