All of lore.kernel.org
 help / color / mirror / Atom feed
* Make PowerNV IOMMU group setup saner (and fix it for hotpug)
@ 2020-04-06  3:07 Oliver O'Halloran
  2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik

Currently on PowerNV the IOMMU group of a device is initialised in
boot-time fixup which runs after devices are probed. Because this is
only run at boot time hotplugged devices do not recieve an iommu group
assignment which prevents them from being passed through to a guest.

This series fixes that by moving the point where IOMMU groups are
registered to when we configure DMA for a PE, and moves the point where
we add a device to the PE's IOMMU group into the per-device DMA setup
callback for IODA phbs (pnv_pci_ioda_dma_dev_setup()). This change means
that we'll do group setup for hotplugged devices and that we can remove
the hack we have for VFs which are currently added to their group
via a bus notifier.

With this change there's no longer any per-device setup that needs to
run in a fixup for ordinary PCI devices. The exception is, as per usual,
NVLink devices. For those the GPU and any of it's NVLink devices need
to be in a "compound" IOMMU group which keeps the DMA address spaces
of each device in sync with it's attached devices. As a result that
setup can only be done when both the NVLink devices and the GPU device
has been probed, so that setup is still done in the fixup. Sucks, but
it's still an improvement.

Boot tested on a witherspoon with 6xGPUs and it didn't crash so it must
be good.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:48   ` Alexey Kardashevskiy
  2020-06-09  5:29   ` Michael Ellerman
  2020-04-06  3:07 ` [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config Oliver O'Halloran
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

Re-work the control flow a bit so what's going on is a little clearer.
This also ensures the table_group is only initialised once in the P9
case. This shouldn't be a functional change since all the GPU PCI
devices should have the same table_group configuration, but it does
look strange.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 46 +++++++++++-------------
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index b95b9e3c4c98..de617549c9a3 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -427,7 +427,7 @@ static void pnv_comp_attach_table_group(struct npu_comp *npucomp,
 
 struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 {
-	struct iommu_table_group *table_group;
+	struct iommu_table_group *compound_group;
 	struct npu_comp *npucomp;
 	struct pci_dev *gpdev = NULL;
 	struct pci_controller *hose;
@@ -446,36 +446,32 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 	hose = pci_bus_to_host(npdev->bus);
 
 	if (hose->npu) {
-		table_group = &hose->npu->npucomp.table_group;
-
-		if (!table_group->group) {
-			table_group->ops = &pnv_npu_peers_ops;
-			iommu_register_group(table_group,
-					hose->global_number,
-					pe->pe_number);
-		}
+		/* P9 case: compound group is per-NPU (all gpus, all links) */
+		npucomp = &hose->npu->npucomp;
 	} else {
-		/* Create a group for 1 GPU and attached NPUs for POWER8 */
-		pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL);
-		table_group = &pe->npucomp->table_group;
-		table_group->ops = &pnv_npu_peers_ops;
-		iommu_register_group(table_group, hose->global_number,
-				pe->pe_number);
+		/* P8 case: Compound group is per-GPU (1 gpu, 2 links) */
+		npucomp = pe->npucomp = kzalloc(sizeof(*npucomp), GFP_KERNEL);
 	}
 
-	/* Steal capabilities from a GPU PE */
-	table_group->max_dynamic_windows_supported =
-		pe->table_group.max_dynamic_windows_supported;
-	table_group->tce32_start = pe->table_group.tce32_start;
-	table_group->tce32_size = pe->table_group.tce32_size;
-	table_group->max_levels = pe->table_group.max_levels;
-	if (!table_group->pgsizes)
-		table_group->pgsizes = pe->table_group.pgsizes;
+	compound_group = &npucomp->table_group;
+	if (!compound_group->group) {
+		compound_group->ops = &pnv_npu_peers_ops;
+		iommu_register_group(compound_group, hose->global_number,
+				pe->pe_number);
+
+		/* Steal capabilities from a GPU PE */
+		compound_group->max_dynamic_windows_supported =
+			pe->table_group.max_dynamic_windows_supported;
+		compound_group->tce32_start = pe->table_group.tce32_start;
+		compound_group->tce32_size = pe->table_group.tce32_size;
+		compound_group->max_levels = pe->table_group.max_levels;
+		if (!compound_group->pgsizes)
+			compound_group->pgsizes = pe->table_group.pgsizes;
+	}
 
-	npucomp = container_of(table_group, struct npu_comp, table_group);
 	pnv_comp_attach_table_group(npucomp, pe);
 
-	return table_group;
+	return compound_group;
 }
 
 struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
  2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:49   ` Alexey Kardashevskiy
  2020-04-06  3:07 ` [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup Oliver O'Halloran
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

In pnv_ioda_setup_vf_PE() we register an iommu group for the VF PE
then call pnv_ioda_setup_bus_iommu_group() to add devices to that group.
However, this function is called before the VFs are scanned so there's
no devices to add.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 57d3a6af1d52..2c340504fa77 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1622,7 +1622,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 #ifdef CONFIG_IOMMU_API
 		iommu_register_group(&pe->table_group,
 				pe->phb->hose->global_number, pe->pe_number);
-		pnv_ioda_setup_bus_iommu_group(pe, &pe->table_group, NULL);
 #endif
 	}
 }
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
  2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
  2020-04-06  3:07 ` [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:51   ` Alexey Kardashevskiy
  2020-04-06  3:07 ` [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup() Oliver O'Halloran
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Reza Arbab, Oliver O'Halloran, Alistair Popple

Move the registration of IOMMU groups out of the post-phb init fixup and
into when we configure DMA for a PE. For most devices this doesn't
result in any functional changes, but for NVLink attached GPUs it
requires a bit of care. When the GPU is probed an IOMMU group would be
created for the PE that contains it. We need to ensure that group is
removed before we add the PE to the compound group that's used to keep
the translations see by the PCIe and NVLink buses the same.

No functional changes. Probably.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Reza Arbab <arbab@linux.ibm.com>
Cc: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/npu-dma.c  |  9 +++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++++++----------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index de617549c9a3..4fbbdfa8b327 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -469,6 +469,15 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 			compound_group->pgsizes = pe->table_group.pgsizes;
 	}
 
+       /*
+	* I'm not sure this is strictly required, but it's probably a good idea
+	* since the table_group for the PE is going to be attached to the
+	* compound table group. If we leave the PE's iommu group active then
+	* we might have the same table_group being modifiable via two sepeate
+	* iommu groups.
+	*/
+	iommu_group_put(pe->table_group.group);
+
 	pnv_comp_attach_table_group(npucomp, pe);
 
 	return compound_group;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2c340504fa77..cf0aaef1b8fa 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1619,10 +1619,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		}
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
-#ifdef CONFIG_IOMMU_API
-		iommu_register_group(&pe->table_group,
-				pe->phb->hose->global_number, pe->pe_number);
-#endif
 	}
 }
 
@@ -2661,9 +2657,6 @@ static void pnv_pci_ioda_setup_iommu_api(void)
 					continue;
 
 				table_group = &pe->table_group;
-				iommu_register_group(&pe->table_group,
-						pe->phb->hose->global_number,
-						pe->pe_number);
 			}
 			pnv_ioda_setup_bus_iommu_group(pe, table_group,
 					pe->pbus);
@@ -2748,14 +2741,17 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 			IOMMU_TABLE_GROUP_MAX_TABLES;
 	pe->table_group.max_levels = POWERNV_IOMMU_MAX_LEVELS;
 	pe->table_group.pgsizes = pnv_ioda_parse_tce_sizes(phb);
-#ifdef CONFIG_IOMMU_API
-	pe->table_group.ops = &pnv_pci_ioda2_ops;
-#endif
 
 	rc = pnv_pci_ioda2_setup_default_config(pe);
 	if (rc)
 		return;
 
+#ifdef CONFIG_IOMMU_API
+	pe->table_group.ops = &pnv_pci_ioda2_ops;
+	iommu_register_group(&pe->table_group, phb->hose->global_number,
+			     pe->pe_number);
+#endif
+
 	if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 }
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup()
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
                   ` (2 preceding siblings ...)
  2020-04-06  3:07 ` [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:51   ` Alexey Kardashevskiy
  2020-04-06  3:07 ` [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup Oliver O'Halloran
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

Historically adding devices to their respective iommu group has been
handled by the post-init phb fixup for most devices. This was done
because:

1) The IOMMU group is tied to the PE (usually) so we can only setup the
   iommu groups after we've done resource allocation since BAR location
   determines the device's PE, and:
2) The sysfs directory for the pci_dev needs to be available since
   iommu_add_device() wants to add an attribute for the iommu group.

However, since commit 30d87ef8b38d ("powerpc/pci: Fix
pcibios_setup_device() ordering") both conditions are met when
hose->ops->dma_dev_setup() is called so there's no real need to do
this in the fixup.

Moving the call to iommu_add_device() into pnv_pci_ioda_dma_setup_dev()
is a nice cleanup since it puts all the per-device IOMMU setup into one
place. It also results in all (non-nvlink) devices getting their iommu
group via a common path rather than relying on the bus notifier hack
in pnv_tce_iommu_bus_notifier() to handle the adding VFs and
hotplugged devices to their group.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/npu-dma.c  |  8 ++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 47 +++++++----------------
 arch/powerpc/platforms/powernv/pci.c      | 20 ----------
 3 files changed, 21 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 4fbbdfa8b327..df27b8d7e78f 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -469,6 +469,12 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 			compound_group->pgsizes = pe->table_group.pgsizes;
 	}
 
+	/*
+	 * The gpu would have been added to the iommu group that's created
+	 * for the PE. Pull it out now.
+	 */
+	iommu_del_device(&gpdev->dev);
+
        /*
 	* I'm not sure this is strictly required, but it's probably a good idea
 	* since the table_group for the PE is going to be attached to the
@@ -478,7 +484,9 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 	*/
 	iommu_group_put(pe->table_group.group);
 
+	/* now put the GPU into the compound group */
 	pnv_comp_attach_table_group(npucomp, pe);
+	iommu_add_device(compound_group, &gpdev->dev);
 
 	return compound_group;
 }
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index cf0aaef1b8fa..9198b7882b57 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1774,12 +1774,10 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
 	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
 	pdev->dev.archdata.dma_offset = pe->tce_bypass_base;
 	set_iommu_table_base(&pdev->dev, pe->table_group.tables[0]);
-	/*
-	 * Note: iommu_add_device() will fail here as
-	 * for physical PE: the device is already added by now;
-	 * for virtual PE: sysfs entries are not ready yet and
-	 * tce_iommu_bus_notifier will add the device to a group later.
-	 */
+
+	/* PEs with a DMA weight of zero won't have a group */
+	if (pe->table_group.group)
+		iommu_add_device(&pe->table_group, &pdev->dev);
 }
 
 /*
@@ -2628,39 +2626,20 @@ static void pnv_pci_ioda_setup_iommu_api(void)
 	struct pnv_ioda_pe *pe;
 
 	/*
-	 * There are 4 types of PEs:
-	 * - PNV_IODA_PE_BUS: a downstream port with an adapter,
-	 *   created from pnv_pci_setup_bridge();
-	 * - PNV_IODA_PE_BUS_ALL: a PCI-PCIX bridge with devices behind it,
-	 *   created from pnv_pci_setup_bridge();
-	 * - PNV_IODA_PE_VF: a SRIOV virtual function,
-	 *   created from pnv_pcibios_sriov_enable();
-	 * - PNV_IODA_PE_DEV: an NPU or OCAPI device,
-	 *   created from pnv_pci_ioda_fixup().
+	 * For non-nvlink devices the IOMMU group is registered when the PE is
+	 * configured and devices are added to the group when the per-device
+	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
+	 * only initialise for "normal" IODA PHBs.
 	 *
-	 * Normally a PE is represented by an IOMMU group, however for
-	 * devices with side channels the groups need to be more strict.
+	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
+	 * in the same IOMMU group, so that's handled here.
 	 */
 	list_for_each_entry(hose, &hose_list, list_node) {
 		phb = hose->private_data;
 
-		if (phb->type == PNV_PHB_NPU_NVLINK ||
-		    phb->type == PNV_PHB_NPU_OCAPI)
-			continue;
-
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			struct iommu_table_group *table_group;
-
-			table_group = pnv_try_setup_npu_table_group(pe);
-			if (!table_group) {
-				if (!pnv_pci_ioda_pe_dma_weight(pe))
-					continue;
-
-				table_group = &pe->table_group;
-			}
-			pnv_ioda_setup_bus_iommu_group(pe, table_group,
-					pe->pbus);
-		}
+		if (phb->type == PNV_PHB_IODA2)
+			list_for_each_entry(pe, &phb->ioda.pe_list, list)
+				pnv_try_setup_npu_table_group(pe);
 	}
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 5bf818246339..091fe1cf386b 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -955,28 +955,8 @@ static int pnv_tce_iommu_bus_notifier(struct notifier_block *nb,
 		unsigned long action, void *data)
 {
 	struct device *dev = data;
-	struct pci_dev *pdev;
-	struct pci_dn *pdn;
-	struct pnv_ioda_pe *pe;
-	struct pci_controller *hose;
-	struct pnv_phb *phb;
 
 	switch (action) {
-	case BUS_NOTIFY_ADD_DEVICE:
-		pdev = to_pci_dev(dev);
-		pdn = pci_get_pdn(pdev);
-		hose = pci_bus_to_host(pdev->bus);
-		phb = hose->private_data;
-
-		WARN_ON_ONCE(!phb);
-		if (!pdn || pdn->pe_number == IODA_INVALID_PE || !phb)
-			return 0;
-
-		pe = &phb->ioda.pe_array[pdn->pe_number];
-		if (!pe->table_group.group)
-			return 0;
-		iommu_add_device(&pe->table_group, dev);
-		return 0;
 	case BUS_NOTIFY_DEL_DEVICE:
 		iommu_del_device(dev);
 		return 0;
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
                   ` (3 preceding siblings ...)
  2020-04-06  3:07 ` [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup() Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:53   ` Alexey Kardashevskiy
  2020-04-06  3:07 ` [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c Oliver O'Halloran
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

No longer used.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 32 -----------------------
 1 file changed, 32 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9198b7882b57..8b45b8e561e9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1550,11 +1550,6 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe);
-#ifdef CONFIG_IOMMU_API
-static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
-		struct iommu_table_group *table_group, struct pci_bus *bus);
-
-#endif
 static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 {
 	struct pci_bus        *bus;
@@ -2590,33 +2585,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
 	.release_ownership = pnv_ioda2_release_ownership,
 };
 
-static void pnv_ioda_setup_bus_iommu_group_add_devices(struct pnv_ioda_pe *pe,
-		struct iommu_table_group *table_group,
-		struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		iommu_add_device(table_group, &dev->dev);
-
-		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
-			pnv_ioda_setup_bus_iommu_group_add_devices(pe,
-					table_group, dev->subordinate);
-	}
-}
-
-static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
-		struct iommu_table_group *table_group, struct pci_bus *bus)
-{
-
-	if (pe->flags & PNV_IODA_PE_DEV)
-		iommu_add_device(table_group, &pe->pdev->dev);
-
-	if ((pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) || bus)
-		pnv_ioda_setup_bus_iommu_group_add_devices(pe, table_group,
-				bus);
-}
-
 static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
 
 static void pnv_pci_ioda_setup_iommu_api(void)
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
                   ` (4 preceding siblings ...)
  2020-04-06  3:07 ` [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:53   ` Alexey Kardashevskiy
  2020-04-06  3:07 ` [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c Oliver O'Halloran
  2020-04-06  9:56 ` Make PowerNV IOMMU group setup saner (and fix it for hotpug) Alexey Kardashevskiy
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

Move it in with the rest of the TCE wrangling rather than carting around
a static prototype in pci-ioda.c

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 28 +++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c     | 30 -------------------
 arch/powerpc/platforms/powernv/pci.h          |  2 ++
 3 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 5dc6847d5f4c..f923359d8afc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -17,6 +17,34 @@
 #include <asm/tce.h>
 #include "pci.h"
 
+unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
+{
+	struct pci_controller *hose = phb->hose;
+	struct device_node *dn = hose->dn;
+	unsigned long mask = 0;
+	int i, rc, count;
+	u32 val;
+
+	count = of_property_count_u32_elems(dn, "ibm,supported-tce-sizes");
+	if (count <= 0) {
+		mask = SZ_4K | SZ_64K;
+		/* Add 16M for POWER8 by default */
+		if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+				!cpu_has_feature(CPU_FTR_ARCH_300))
+			mask |= SZ_16M | SZ_256M;
+		return mask;
+	}
+
+	for (i = 0; i < count; i++) {
+		rc = of_property_read_u32_index(dn, "ibm,supported-tce-sizes",
+						i, &val);
+		if (rc == 0)
+			mask |= 1ULL << val;
+	}
+
+	return mask;
+}
+
 void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 		void *tce_mem, u64 tce_size,
 		u64 dma_offset, unsigned int page_shift)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8b45b8e561e9..c020ade3a846 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2585,8 +2585,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
 	.release_ownership = pnv_ioda2_release_ownership,
 };
 
-static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
-
 static void pnv_pci_ioda_setup_iommu_api(void)
 {
 	struct pci_controller *hose;
@@ -2638,34 +2636,6 @@ static void pnv_pci_ioda_setup_iommu_api(void)
 static void pnv_pci_ioda_setup_iommu_api(void) { };
 #endif
 
-static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
-{
-	struct pci_controller *hose = phb->hose;
-	struct device_node *dn = hose->dn;
-	unsigned long mask = 0;
-	int i, rc, count;
-	u32 val;
-
-	count = of_property_count_u32_elems(dn, "ibm,supported-tce-sizes");
-	if (count <= 0) {
-		mask = SZ_4K | SZ_64K;
-		/* Add 16M for POWER8 by default */
-		if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
-				!cpu_has_feature(CPU_FTR_ARCH_300))
-			mask |= SZ_16M | SZ_256M;
-		return mask;
-	}
-
-	for (i = 0; i < count; i++) {
-		rc = of_property_read_u32_index(dn, "ibm,supported-tce-sizes",
-						i, &val);
-		if (rc == 0)
-			mask |= 1ULL << val;
-	}
-
-	return mask;
-}
-
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 				       struct pnv_ioda_pe *pe)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d3bbdeab3a32..0c5845a1f05d 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -244,4 +244,6 @@ extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 		void *tce_mem, u64 tce_size,
 		u64 dma_offset, unsigned int page_shift);
 
+extern unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
+
 #endif /* __POWERNV_PCI_H */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
                   ` (5 preceding siblings ...)
  2020-04-06  3:07 ` [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c Oliver O'Halloran
@ 2020-04-06  3:07 ` Oliver O'Halloran
  2020-04-06  9:54   ` Alexey Kardashevskiy
  2020-04-06  9:56 ` Make PowerNV IOMMU group setup saner (and fix it for hotpug) Alexey Kardashevskiy
  7 siblings, 1 reply; 17+ messages in thread
From: Oliver O'Halloran @ 2020-04-06  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Oliver O'Halloran

The NVlink IOMMU group setup is only relevant to NVLink devices so move
it into the NPU containment zone. This let us remove some prototypes in
pci.h and staticfy some function definitions.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/powernv/npu-dma.c  | 54 +++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 60 +++--------------------
 arch/powerpc/platforms/powernv/pci.h      |  6 +--
 3 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index df27b8d7e78f..abeaa533b976 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -15,6 +15,7 @@
 
 #include <asm/debugfs.h>
 #include <asm/powernv.h>
+#include <asm/ppc-pci.h>
 #include <asm/opal.h>
 
 #include "pci.h"
@@ -425,7 +426,8 @@ static void pnv_comp_attach_table_group(struct npu_comp *npucomp,
 	++npucomp->pe_num;
 }
 
-struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
+static struct iommu_table_group *
+	pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table_group *compound_group;
 	struct npu_comp *npucomp;
@@ -491,7 +493,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 	return compound_group;
 }
 
-struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
+static struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table_group *table_group;
 	struct npu_comp *npucomp;
@@ -534,6 +536,54 @@ struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
 
 	return table_group;
 }
+
+void pnv_pci_npu_setup_iommu_groups(void)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct pnv_ioda_pe *pe;
+
+	/*
+	 * For non-nvlink devices the IOMMU group is registered when the PE is
+	 * configured and devices are added to the group when the per-device
+	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
+	 * only initialise for "normal" IODA PHBs.
+	 *
+	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
+	 * in the same IOMMU group, so that's handled here.
+	 */
+	list_for_each_entry(hose, &hose_list, list_node) {
+		phb = hose->private_data;
+
+		if (phb->type == PNV_PHB_IODA2)
+			list_for_each_entry(pe, &phb->ioda.pe_list, list)
+				pnv_try_setup_npu_table_group(pe);
+	}
+
+	/*
+	 * Now we have all PHBs discovered, time to add NPU devices to
+	 * the corresponding IOMMU groups.
+	 */
+	list_for_each_entry(hose, &hose_list, list_node) {
+		unsigned long  pgsizes;
+
+		phb = hose->private_data;
+
+		if (phb->type != PNV_PHB_NPU_NVLINK)
+			continue;
+
+		pgsizes = pnv_ioda_parse_tce_sizes(phb);
+		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+			/*
+			 * IODA2 bridges get this set up from
+			 * pci_controller_ops::setup_bridge but NPU bridges
+			 * do not have this hook defined so we do it here.
+			 */
+			pe->table_group.pgsizes = pgsizes;
+			pnv_npu_compound_attach(pe);
+		}
+	}
+}
 #endif /* CONFIG_IOMMU_API */
 
 int pnv_npu2_init(struct pci_controller *hose)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c020ade3a846..dba0c2c09f61 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1288,7 +1288,7 @@ static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
 		pnv_ioda_setup_npu_PE(pdev);
 }
 
-static void pnv_pci_ioda_setup_PEs(void)
+static void pnv_pci_ioda_setup_nvlink(void)
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
@@ -1312,6 +1312,11 @@ static void pnv_pci_ioda_setup_PEs(void)
 		list_for_each_entry(pe, &phb->ioda.pe_list, list)
 			pnv_npu2_map_lpar(pe, MSR_DR | MSR_PR | MSR_HV);
 	}
+
+#ifdef CONFIG_IOMMU_API
+	/* setup iommu groups so we can do nvlink pass-thru */
+	pnv_pci_npu_setup_iommu_groups();
+#endif
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -2584,56 +2589,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
 	.take_ownership = pnv_ioda2_take_ownership,
 	.release_ownership = pnv_ioda2_release_ownership,
 };
-
-static void pnv_pci_ioda_setup_iommu_api(void)
-{
-	struct pci_controller *hose;
-	struct pnv_phb *phb;
-	struct pnv_ioda_pe *pe;
-
-	/*
-	 * For non-nvlink devices the IOMMU group is registered when the PE is
-	 * configured and devices are added to the group when the per-device
-	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
-	 * only initialise for "normal" IODA PHBs.
-	 *
-	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
-	 * in the same IOMMU group, so that's handled here.
-	 */
-	list_for_each_entry(hose, &hose_list, list_node) {
-		phb = hose->private_data;
-
-		if (phb->type == PNV_PHB_IODA2)
-			list_for_each_entry(pe, &phb->ioda.pe_list, list)
-				pnv_try_setup_npu_table_group(pe);
-	}
-
-	/*
-	 * Now we have all PHBs discovered, time to add NPU devices to
-	 * the corresponding IOMMU groups.
-	 */
-	list_for_each_entry(hose, &hose_list, list_node) {
-		unsigned long  pgsizes;
-
-		phb = hose->private_data;
-
-		if (phb->type != PNV_PHB_NPU_NVLINK)
-			continue;
-
-		pgsizes = pnv_ioda_parse_tce_sizes(phb);
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			/*
-			 * IODA2 bridges get this set up from
-			 * pci_controller_ops::setup_bridge but NPU bridges
-			 * do not have this hook defined so we do it here.
-			 */
-			pe->table_group.pgsizes = pgsizes;
-			pnv_npu_compound_attach(pe);
-		}
-	}
-}
-#else /* !CONFIG_IOMMU_API */
-static void pnv_pci_ioda_setup_iommu_api(void) { };
 #endif
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -3132,8 +3087,7 @@ static void pnv_pci_enable_bridges(void)
 
 static void pnv_pci_ioda_fixup(void)
 {
-	pnv_pci_ioda_setup_PEs();
-	pnv_pci_ioda_setup_iommu_api();
+	pnv_pci_ioda_setup_nvlink();
 	pnv_pci_ioda_create_dbgfs();
 
 	pnv_pci_enable_bridges();
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 0c5845a1f05d..20941ef2706e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -209,11 +209,7 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 /* Nvlink functions */
 extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
 extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
-extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
-extern struct iommu_table_group *pnv_try_setup_npu_table_group(
-		struct pnv_ioda_pe *pe);
-extern struct iommu_table_group *pnv_npu_compound_attach(
-		struct pnv_ioda_pe *pe);
+extern void pnv_pci_npu_setup_iommu_groups(void);
 
 /* pci-ioda-tce.c */
 #define POWERNV_IOMMU_DEFAULT_LEVELS	2
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation
  2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
@ 2020-04-06  9:48   ` Alexey Kardashevskiy
  2020-06-09  5:29   ` Michael Ellerman
  1 sibling, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:48 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Re-work the control flow a bit so what's going on is a little clearer.
> This also ensures the table_group is only initialised once in the P9
> case. This shouldn't be a functional change since all the GPU PCI
> devices should have the same table_group configuration, but it does
> look strange.
> 
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 46 +++++++++++-------------
>  1 file changed, 21 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index b95b9e3c4c98..de617549c9a3 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -427,7 +427,7 @@ static void pnv_comp_attach_table_group(struct npu_comp *npucomp,
>  
>  struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  {
> -	struct iommu_table_group *table_group;
> +	struct iommu_table_group *compound_group;
>  	struct npu_comp *npucomp;
>  	struct pci_dev *gpdev = NULL;
>  	struct pci_controller *hose;
> @@ -446,36 +446,32 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  	hose = pci_bus_to_host(npdev->bus);
>  
>  	if (hose->npu) {
> -		table_group = &hose->npu->npucomp.table_group;
> -
> -		if (!table_group->group) {
> -			table_group->ops = &pnv_npu_peers_ops;
> -			iommu_register_group(table_group,
> -					hose->global_number,
> -					pe->pe_number);
> -		}
> +		/* P9 case: compound group is per-NPU (all gpus, all links) */
> +		npucomp = &hose->npu->npucomp;
>  	} else {
> -		/* Create a group for 1 GPU and attached NPUs for POWER8 */
> -		pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL);
> -		table_group = &pe->npucomp->table_group;
> -		table_group->ops = &pnv_npu_peers_ops;
> -		iommu_register_group(table_group, hose->global_number,
> -				pe->pe_number);
> +		/* P8 case: Compound group is per-GPU (1 gpu, 2 links) */
> +		npucomp = pe->npucomp = kzalloc(sizeof(*npucomp), GFP_KERNEL);
>  	}
>  
> -	/* Steal capabilities from a GPU PE */
> -	table_group->max_dynamic_windows_supported =
> -		pe->table_group.max_dynamic_windows_supported;
> -	table_group->tce32_start = pe->table_group.tce32_start;
> -	table_group->tce32_size = pe->table_group.tce32_size;
> -	table_group->max_levels = pe->table_group.max_levels;
> -	if (!table_group->pgsizes)
> -		table_group->pgsizes = pe->table_group.pgsizes;
> +	compound_group = &npucomp->table_group;
> +	if (!compound_group->group) {
> +		compound_group->ops = &pnv_npu_peers_ops;
> +		iommu_register_group(compound_group, hose->global_number,
> +				pe->pe_number);
> +
> +		/* Steal capabilities from a GPU PE */
> +		compound_group->max_dynamic_windows_supported =
> +			pe->table_group.max_dynamic_windows_supported;
> +		compound_group->tce32_start = pe->table_group.tce32_start;
> +		compound_group->tce32_size = pe->table_group.tce32_size;
> +		compound_group->max_levels = pe->table_group.max_levels;
> +		if (!compound_group->pgsizes)
> +			compound_group->pgsizes = pe->table_group.pgsizes;
> +	}
>  
> -	npucomp = container_of(table_group, struct npu_comp, table_group);
>  	pnv_comp_attach_table_group(npucomp, pe);
>  
> -	return table_group;
> +	return compound_group;
>  }
>  
>  struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config
  2020-04-06  3:07 ` [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config Oliver O'Halloran
@ 2020-04-06  9:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:49 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> In pnv_ioda_setup_vf_PE() we register an iommu group for the VF PE
> then call pnv_ioda_setup_bus_iommu_group() to add devices to that group.
> However, this function is called before the VFs are scanned so there's
> no devices to add.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 57d3a6af1d52..2c340504fa77 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1622,7 +1622,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>  #ifdef CONFIG_IOMMU_API
>  		iommu_register_group(&pe->table_group,
>  				pe->phb->hose->global_number, pe->pe_number);
> -		pnv_ioda_setup_bus_iommu_group(pe, &pe->table_group, NULL);
>  #endif
>  	}
>  }
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup
  2020-04-06  3:07 ` [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup Oliver O'Halloran
@ 2020-04-06  9:51   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:51 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Alistair Popple, Reza Arbab



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Move the registration of IOMMU groups out of the post-phb init fixup and
> into when we configure DMA for a PE. For most devices this doesn't
> result in any functional changes, but for NVLink attached GPUs it
> requires a bit of care. When the GPU is probed an IOMMU group would be
> created for the PE that contains it. We need to ensure that group is
> removed before we add the PE to the compound group that's used to keep
> the translations see by the PCIe and NVLink buses the same.
> 
> No functional changes. Probably.
> 
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Reza Arbab <arbab@linux.ibm.com>
> Cc: Alistair Popple <alistair@popple.id.au>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c  |  9 +++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++++++----------
>  2 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index de617549c9a3..4fbbdfa8b327 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -469,6 +469,15 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  			compound_group->pgsizes = pe->table_group.pgsizes;
>  	}
>  
> +       /*
> +	* I'm not sure this is strictly required, but it's probably a good idea
> +	* since the table_group for the PE is going to be attached to the


This seems just a right thing to do so I suggest changing the comment
from "not sure" to "we are going to put GPU to a compound group and
won't need the individual group".

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> +	* compound table group. If we leave the PE's iommu group active then
> +	* we might have the same table_group being modifiable via two sepeate
> +	* iommu groups.
> +	*/
> +	iommu_group_put(pe->table_group.group);
> +
>  	pnv_comp_attach_table_group(npucomp, pe);
>  
>  	return compound_group;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 2c340504fa77..cf0aaef1b8fa 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1619,10 +1619,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>  		}
>  
>  		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> -#ifdef CONFIG_IOMMU_API
> -		iommu_register_group(&pe->table_group,
> -				pe->phb->hose->global_number, pe->pe_number);
> -#endif
>  	}
>  }
>  
> @@ -2661,9 +2657,6 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>  					continue;
>  
>  				table_group = &pe->table_group;
> -				iommu_register_group(&pe->table_group,
> -						pe->phb->hose->global_number,
> -						pe->pe_number);
>  			}
>  			pnv_ioda_setup_bus_iommu_group(pe, table_group,
>  					pe->pbus);
> @@ -2748,14 +2741,17 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>  			IOMMU_TABLE_GROUP_MAX_TABLES;
>  	pe->table_group.max_levels = POWERNV_IOMMU_MAX_LEVELS;
>  	pe->table_group.pgsizes = pnv_ioda_parse_tce_sizes(phb);
> -#ifdef CONFIG_IOMMU_API
> -	pe->table_group.ops = &pnv_pci_ioda2_ops;
> -#endif
>  
>  	rc = pnv_pci_ioda2_setup_default_config(pe);
>  	if (rc)
>  		return;
>  
> +#ifdef CONFIG_IOMMU_API
> +	pe->table_group.ops = &pnv_pci_ioda2_ops;
> +	iommu_register_group(&pe->table_group, phb->hose->global_number,
> +			     pe->pe_number);
> +#endif
> +
>  	if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
>  		pnv_ioda_setup_bus_dma(pe, pe->pbus);
>  }
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup()
  2020-04-06  3:07 ` [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup() Oliver O'Halloran
@ 2020-04-06  9:51   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:51 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Historically adding devices to their respective iommu group has been
> handled by the post-init phb fixup for most devices. This was done
> because:
> 
> 1) The IOMMU group is tied to the PE (usually) so we can only setup the
>    iommu groups after we've done resource allocation since BAR location
>    determines the device's PE, and:
> 2) The sysfs directory for the pci_dev needs to be available since
>    iommu_add_device() wants to add an attribute for the iommu group.
> 
> However, since commit 30d87ef8b38d ("powerpc/pci: Fix
> pcibios_setup_device() ordering") both conditions are met when
> hose->ops->dma_dev_setup() is called so there's no real need to do
> this in the fixup.
> 
> Moving the call to iommu_add_device() into pnv_pci_ioda_dma_setup_dev()
> is a nice cleanup since it puts all the per-device IOMMU setup into one
> place. It also results in all (non-nvlink) devices getting their iommu
> group via a common path rather than relying on the bus notifier hack
> in pnv_tce_iommu_bus_notifier() to handle the adding VFs and
> hotplugged devices to their group.
> 
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>  arch/powerpc/platforms/powernv/npu-dma.c  |  8 ++++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 47 +++++++----------------
>  arch/powerpc/platforms/powernv/pci.c      | 20 ----------
>  3 files changed, 21 insertions(+), 54 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 4fbbdfa8b327..df27b8d7e78f 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -469,6 +469,12 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  			compound_group->pgsizes = pe->table_group.pgsizes;
>  	}
>  
> +	/*
> +	 * The gpu would have been added to the iommu group that's created
> +	 * for the PE. Pull it out now.
> +	 */
> +	iommu_del_device(&gpdev->dev);
> +
>         /*
>  	* I'm not sure this is strictly required, but it's probably a good idea
>  	* since the table_group for the PE is going to be attached to the
> @@ -478,7 +484,9 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  	*/
>  	iommu_group_put(pe->table_group.group);
>  
> +	/* now put the GPU into the compound group */
>  	pnv_comp_attach_table_group(npucomp, pe);
> +	iommu_add_device(compound_group, &gpdev->dev);
>  
>  	return compound_group;
>  }
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index cf0aaef1b8fa..9198b7882b57 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1774,12 +1774,10 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
>  	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
>  	pdev->dev.archdata.dma_offset = pe->tce_bypass_base;
>  	set_iommu_table_base(&pdev->dev, pe->table_group.tables[0]);
> -	/*
> -	 * Note: iommu_add_device() will fail here as
> -	 * for physical PE: the device is already added by now;
> -	 * for virtual PE: sysfs entries are not ready yet and
> -	 * tce_iommu_bus_notifier will add the device to a group later.
> -	 */
> +
> +	/* PEs with a DMA weight of zero won't have a group */
> +	if (pe->table_group.group)
> +		iommu_add_device(&pe->table_group, &pdev->dev);
>  }
>  
>  /*
> @@ -2628,39 +2626,20 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>  	struct pnv_ioda_pe *pe;
>  
>  	/*
> -	 * There are 4 types of PEs:
> -	 * - PNV_IODA_PE_BUS: a downstream port with an adapter,
> -	 *   created from pnv_pci_setup_bridge();
> -	 * - PNV_IODA_PE_BUS_ALL: a PCI-PCIX bridge with devices behind it,
> -	 *   created from pnv_pci_setup_bridge();
> -	 * - PNV_IODA_PE_VF: a SRIOV virtual function,
> -	 *   created from pnv_pcibios_sriov_enable();
> -	 * - PNV_IODA_PE_DEV: an NPU or OCAPI device,
> -	 *   created from pnv_pci_ioda_fixup().
> +	 * For non-nvlink devices the IOMMU group is registered when the PE is
> +	 * configured and devices are added to the group when the per-device
> +	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
> +	 * only initialise for "normal" IODA PHBs.
>  	 *
> -	 * Normally a PE is represented by an IOMMU group, however for
> -	 * devices with side channels the groups need to be more strict.
> +	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
> +	 * in the same IOMMU group, so that's handled here.
>  	 */
>  	list_for_each_entry(hose, &hose_list, list_node) {
>  		phb = hose->private_data;
>  
> -		if (phb->type == PNV_PHB_NPU_NVLINK ||
> -		    phb->type == PNV_PHB_NPU_OCAPI)
> -			continue;
> -
> -		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -			struct iommu_table_group *table_group;
> -
> -			table_group = pnv_try_setup_npu_table_group(pe);
> -			if (!table_group) {
> -				if (!pnv_pci_ioda_pe_dma_weight(pe))
> -					continue;
> -
> -				table_group = &pe->table_group;
> -			}
> -			pnv_ioda_setup_bus_iommu_group(pe, table_group,
> -					pe->pbus);
> -		}
> +		if (phb->type == PNV_PHB_IODA2)
> +			list_for_each_entry(pe, &phb->ioda.pe_list, list)
> +				pnv_try_setup_npu_table_group(pe);
>  	}
>  
>  	/*
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 5bf818246339..091fe1cf386b 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -955,28 +955,8 @@ static int pnv_tce_iommu_bus_notifier(struct notifier_block *nb,
>  		unsigned long action, void *data)
>  {
>  	struct device *dev = data;
> -	struct pci_dev *pdev;
> -	struct pci_dn *pdn;
> -	struct pnv_ioda_pe *pe;
> -	struct pci_controller *hose;
> -	struct pnv_phb *phb;
>  
>  	switch (action) {
> -	case BUS_NOTIFY_ADD_DEVICE:
> -		pdev = to_pci_dev(dev);
> -		pdn = pci_get_pdn(pdev);
> -		hose = pci_bus_to_host(pdev->bus);
> -		phb = hose->private_data;
> -
> -		WARN_ON_ONCE(!phb);
> -		if (!pdn || pdn->pe_number == IODA_INVALID_PE || !phb)
> -			return 0;
> -
> -		pe = &phb->ioda.pe_array[pdn->pe_number];
> -		if (!pe->table_group.group)
> -			return 0;
> -		iommu_add_device(&pe->table_group, dev);
> -		return 0;
>  	case BUS_NOTIFY_DEL_DEVICE:
>  		iommu_del_device(dev);
>  		return 0;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup
  2020-04-06  3:07 ` [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup Oliver O'Halloran
@ 2020-04-06  9:53   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> No longer used.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Nit: you could fold it into 4/7.


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 32 -----------------------
>  1 file changed, 32 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 9198b7882b57..8b45b8e561e9 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1550,11 +1550,6 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>  
>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>  				       struct pnv_ioda_pe *pe);
> -#ifdef CONFIG_IOMMU_API
> -static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
> -		struct iommu_table_group *table_group, struct pci_bus *bus);
> -
> -#endif
>  static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>  {
>  	struct pci_bus        *bus;
> @@ -2590,33 +2585,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
>  	.release_ownership = pnv_ioda2_release_ownership,
>  };
>  
> -static void pnv_ioda_setup_bus_iommu_group_add_devices(struct pnv_ioda_pe *pe,
> -		struct iommu_table_group *table_group,
> -		struct pci_bus *bus)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		iommu_add_device(table_group, &dev->dev);
> -
> -		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> -			pnv_ioda_setup_bus_iommu_group_add_devices(pe,
> -					table_group, dev->subordinate);
> -	}
> -}
> -
> -static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
> -		struct iommu_table_group *table_group, struct pci_bus *bus)
> -{
> -
> -	if (pe->flags & PNV_IODA_PE_DEV)
> -		iommu_add_device(table_group, &pe->pdev->dev);
> -
> -	if ((pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) || bus)
> -		pnv_ioda_setup_bus_iommu_group_add_devices(pe, table_group,
> -				bus);
> -}
> -
>  static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
>  
>  static void pnv_pci_ioda_setup_iommu_api(void)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c
  2020-04-06  3:07 ` [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c Oliver O'Halloran
@ 2020-04-06  9:53   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Move it in with the rest of the TCE wrangling rather than carting around
> a static prototype in pci-ioda.c
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 28 +++++++++++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c     | 30 -------------------
>  arch/powerpc/platforms/powernv/pci.h          |  2 ++
>  3 files changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index 5dc6847d5f4c..f923359d8afc 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -17,6 +17,34 @@
>  #include <asm/tce.h>
>  #include "pci.h"
>  
> +unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
> +{
> +	struct pci_controller *hose = phb->hose;
> +	struct device_node *dn = hose->dn;
> +	unsigned long mask = 0;
> +	int i, rc, count;
> +	u32 val;
> +
> +	count = of_property_count_u32_elems(dn, "ibm,supported-tce-sizes");
> +	if (count <= 0) {
> +		mask = SZ_4K | SZ_64K;
> +		/* Add 16M for POWER8 by default */
> +		if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> +				!cpu_has_feature(CPU_FTR_ARCH_300))
> +			mask |= SZ_16M | SZ_256M;
> +		return mask;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		rc = of_property_read_u32_index(dn, "ibm,supported-tce-sizes",
> +						i, &val);
> +		if (rc == 0)
> +			mask |= 1ULL << val;
> +	}
> +
> +	return mask;
> +}
> +
>  void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>  		void *tce_mem, u64 tce_size,
>  		u64 dma_offset, unsigned int page_shift)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8b45b8e561e9..c020ade3a846 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2585,8 +2585,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
>  	.release_ownership = pnv_ioda2_release_ownership,
>  };
>  
> -static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
> -
>  static void pnv_pci_ioda_setup_iommu_api(void)
>  {
>  	struct pci_controller *hose;
> @@ -2638,34 +2636,6 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>  static void pnv_pci_ioda_setup_iommu_api(void) { };
>  #endif
>  
> -static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb)
> -{
> -	struct pci_controller *hose = phb->hose;
> -	struct device_node *dn = hose->dn;
> -	unsigned long mask = 0;
> -	int i, rc, count;
> -	u32 val;
> -
> -	count = of_property_count_u32_elems(dn, "ibm,supported-tce-sizes");
> -	if (count <= 0) {
> -		mask = SZ_4K | SZ_64K;
> -		/* Add 16M for POWER8 by default */
> -		if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> -				!cpu_has_feature(CPU_FTR_ARCH_300))
> -			mask |= SZ_16M | SZ_256M;
> -		return mask;
> -	}
> -
> -	for (i = 0; i < count; i++) {
> -		rc = of_property_read_u32_index(dn, "ibm,supported-tce-sizes",
> -						i, &val);
> -		if (rc == 0)
> -			mask |= 1ULL << val;
> -	}
> -
> -	return mask;
> -}
> -
>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>  				       struct pnv_ioda_pe *pe)
>  {
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index d3bbdeab3a32..0c5845a1f05d 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -244,4 +244,6 @@ extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>  		void *tce_mem, u64 tce_size,
>  		u64 dma_offset, unsigned int page_shift);
>  
> +extern unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
> +
>  #endif /* __POWERNV_PCI_H */
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c
  2020-04-06  3:07 ` [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c Oliver O'Halloran
@ 2020-04-06  9:54   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:54 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> The NVlink IOMMU group setup is only relevant to NVLink devices so move
> it into the NPU containment zone. This let us remove some prototypes in
> pci.h and staticfy some function definitions.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>  arch/powerpc/platforms/powernv/npu-dma.c  | 54 +++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 60 +++--------------------
>  arch/powerpc/platforms/powernv/pci.h      |  6 +--
>  3 files changed, 60 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index df27b8d7e78f..abeaa533b976 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -15,6 +15,7 @@
>  
>  #include <asm/debugfs.h>
>  #include <asm/powernv.h>
> +#include <asm/ppc-pci.h>
>  #include <asm/opal.h>
>  
>  #include "pci.h"
> @@ -425,7 +426,8 @@ static void pnv_comp_attach_table_group(struct npu_comp *npucomp,
>  	++npucomp->pe_num;
>  }
>  
> -struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
> +static struct iommu_table_group *
> +	pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  {
>  	struct iommu_table_group *compound_group;
>  	struct npu_comp *npucomp;
> @@ -491,7 +493,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>  	return compound_group;
>  }
>  
> -struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
> +static struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
>  {
>  	struct iommu_table_group *table_group;
>  	struct npu_comp *npucomp;
> @@ -534,6 +536,54 @@ struct iommu_table_group *pnv_npu_compound_attach(struct pnv_ioda_pe *pe)
>  
>  	return table_group;
>  }
> +
> +void pnv_pci_npu_setup_iommu_groups(void)
> +{
> +	struct pci_controller *hose;
> +	struct pnv_phb *phb;
> +	struct pnv_ioda_pe *pe;
> +
> +	/*
> +	 * For non-nvlink devices the IOMMU group is registered when the PE is
> +	 * configured and devices are added to the group when the per-device
> +	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
> +	 * only initialise for "normal" IODA PHBs.
> +	 *
> +	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
> +	 * in the same IOMMU group, so that's handled here.
> +	 */
> +	list_for_each_entry(hose, &hose_list, list_node) {
> +		phb = hose->private_data;
> +
> +		if (phb->type == PNV_PHB_IODA2)
> +			list_for_each_entry(pe, &phb->ioda.pe_list, list)
> +				pnv_try_setup_npu_table_group(pe);
> +	}
> +
> +	/*
> +	 * Now we have all PHBs discovered, time to add NPU devices to
> +	 * the corresponding IOMMU groups.
> +	 */
> +	list_for_each_entry(hose, &hose_list, list_node) {
> +		unsigned long  pgsizes;
> +
> +		phb = hose->private_data;
> +
> +		if (phb->type != PNV_PHB_NPU_NVLINK)
> +			continue;
> +
> +		pgsizes = pnv_ioda_parse_tce_sizes(phb);
> +		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +			/*
> +			 * IODA2 bridges get this set up from
> +			 * pci_controller_ops::setup_bridge but NPU bridges
> +			 * do not have this hook defined so we do it here.
> +			 */
> +			pe->table_group.pgsizes = pgsizes;
> +			pnv_npu_compound_attach(pe);
> +		}
> +	}
> +}
>  #endif /* CONFIG_IOMMU_API */
>  
>  int pnv_npu2_init(struct pci_controller *hose)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c020ade3a846..dba0c2c09f61 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1288,7 +1288,7 @@ static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
>  		pnv_ioda_setup_npu_PE(pdev);
>  }
>  
> -static void pnv_pci_ioda_setup_PEs(void)
> +static void pnv_pci_ioda_setup_nvlink(void)
>  {
>  	struct pci_controller *hose;
>  	struct pnv_phb *phb;
> @@ -1312,6 +1312,11 @@ static void pnv_pci_ioda_setup_PEs(void)
>  		list_for_each_entry(pe, &phb->ioda.pe_list, list)
>  			pnv_npu2_map_lpar(pe, MSR_DR | MSR_PR | MSR_HV);
>  	}
> +
> +#ifdef CONFIG_IOMMU_API
> +	/* setup iommu groups so we can do nvlink pass-thru */
> +	pnv_pci_npu_setup_iommu_groups();
> +#endif
>  }
>  
>  #ifdef CONFIG_PCI_IOV
> @@ -2584,56 +2589,6 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
>  	.take_ownership = pnv_ioda2_take_ownership,
>  	.release_ownership = pnv_ioda2_release_ownership,
>  };
> -
> -static void pnv_pci_ioda_setup_iommu_api(void)
> -{
> -	struct pci_controller *hose;
> -	struct pnv_phb *phb;
> -	struct pnv_ioda_pe *pe;
> -
> -	/*
> -	 * For non-nvlink devices the IOMMU group is registered when the PE is
> -	 * configured and devices are added to the group when the per-device
> -	 * DMA setup is run. That's done in hose->ops.dma_dev_setup() which is
> -	 * only initialise for "normal" IODA PHBs.
> -	 *
> -	 * For NVLink devices we need to ensure the NVLinks and the GPU end up
> -	 * in the same IOMMU group, so that's handled here.
> -	 */
> -	list_for_each_entry(hose, &hose_list, list_node) {
> -		phb = hose->private_data;
> -
> -		if (phb->type == PNV_PHB_IODA2)
> -			list_for_each_entry(pe, &phb->ioda.pe_list, list)
> -				pnv_try_setup_npu_table_group(pe);
> -	}
> -
> -	/*
> -	 * Now we have all PHBs discovered, time to add NPU devices to
> -	 * the corresponding IOMMU groups.
> -	 */
> -	list_for_each_entry(hose, &hose_list, list_node) {
> -		unsigned long  pgsizes;
> -
> -		phb = hose->private_data;
> -
> -		if (phb->type != PNV_PHB_NPU_NVLINK)
> -			continue;
> -
> -		pgsizes = pnv_ioda_parse_tce_sizes(phb);
> -		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> -			/*
> -			 * IODA2 bridges get this set up from
> -			 * pci_controller_ops::setup_bridge but NPU bridges
> -			 * do not have this hook defined so we do it here.
> -			 */
> -			pe->table_group.pgsizes = pgsizes;
> -			pnv_npu_compound_attach(pe);
> -		}
> -	}
> -}
> -#else /* !CONFIG_IOMMU_API */
> -static void pnv_pci_ioda_setup_iommu_api(void) { };
>  #endif
>  
>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> @@ -3132,8 +3087,7 @@ static void pnv_pci_enable_bridges(void)
>  
>  static void pnv_pci_ioda_fixup(void)
>  {
> -	pnv_pci_ioda_setup_PEs();
> -	pnv_pci_ioda_setup_iommu_api();
> +	pnv_pci_ioda_setup_nvlink();
>  	pnv_pci_ioda_create_dbgfs();
>  
>  	pnv_pci_enable_bridges();
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 0c5845a1f05d..20941ef2706e 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -209,11 +209,7 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>  /* Nvlink functions */
>  extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
>  extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
> -extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
> -extern struct iommu_table_group *pnv_try_setup_npu_table_group(
> -		struct pnv_ioda_pe *pe);
> -extern struct iommu_table_group *pnv_npu_compound_attach(
> -		struct pnv_ioda_pe *pe);
> +extern void pnv_pci_npu_setup_iommu_groups(void);
>  
>  /* pci-ioda-tce.c */
>  #define POWERNV_IOMMU_DEFAULT_LEVELS	2
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Make PowerNV IOMMU group setup saner (and fix it for hotpug)
  2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
                   ` (6 preceding siblings ...)
  2020-04-06  3:07 ` [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c Oliver O'Halloran
@ 2020-04-06  9:56 ` Alexey Kardashevskiy
  7 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06  9:56 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev



On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Currently on PowerNV the IOMMU group of a device is initialised in
> boot-time fixup which runs after devices are probed. Because this is
> only run at boot time hotplugged devices do not recieve an iommu group
> assignment which prevents them from being passed through to a guest.
> 
> This series fixes that by moving the point where IOMMU groups are
> registered to when we configure DMA for a PE, and moves the point where
> we add a device to the PE's IOMMU group into the per-device DMA setup
> callback for IODA phbs (pnv_pci_ioda_dma_dev_setup()). This change means
> that we'll do group setup for hotplugged devices and that we can remove
> the hack we have for VFs which are currently added to their group
> via a bus notifier.
> 
> With this change there's no longer any per-device setup that needs to
> run in a fixup for ordinary PCI devices. The exception is, as per usual,
> NVLink devices. For those the GPU and any of it's NVLink devices need
> to be in a "compound" IOMMU group which keeps the DMA address spaces
> of each device in sync with it's attached devices. As a result that
> setup can only be done when both the NVLink devices and the GPU device
> has been probed, so that setup is still done in the fixup. Sucks, but
> it's still an improvement.
> 
> Boot tested on a witherspoon with 6xGPUs and it didn't crash so it must
> be good.

Thanks for cleaning this up!

I tried this with IOV on P8 (garrison2) and witherspoon+GPU+NPU
passthrough, works, as before, IOMMU group numbers change but we never
relied on those anyway.



-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation
  2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
  2020-04-06  9:48   ` Alexey Kardashevskiy
@ 2020-06-09  5:29   ` Michael Ellerman
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2020-06-09  5:29 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: aik

On Mon, 6 Apr 2020 13:07:39 +1000, Oliver O'Halloran wrote:
> Re-work the control flow a bit so what's going on is a little clearer.
> This also ensures the table_group is only initialised once in the P9
> case. This shouldn't be a functional change since all the GPU PCI
> devices should have the same table_group configuration, but it does
> look strange.

Applied to powerpc/next.

[1/7] powerpc/powernv/npu: Clean up compound table group initialisation
      https://git.kernel.org/powerpc/c/6984856865b55c9c1ee0814c30296119cd8ba511
[2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config
      https://git.kernel.org/powerpc/c/6cff91b2b97b1b40a52971c9b1e99980dd49fd54
[3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup
      https://git.kernel.org/powerpc/c/9b9408c55935ecc3b1c27b3eeb5a507394113cbb
[4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup()
      https://git.kernel.org/powerpc/c/84d8cc076723058cc294f4360db6ff7758c25b74
[5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup
      https://git.kernel.org/powerpc/c/f39b8b10fcc5d4617d2be5f2910e017a55444b43
[6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c
      https://git.kernel.org/powerpc/c/96e2006a9dbc02cb1c103521405d457438a2e260
[7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c
      https://git.kernel.org/powerpc/c/03b7bf341c18ff19129cc2825b62bb0e212463f1

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-06-09  6:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06  3:07 Make PowerNV IOMMU group setup saner (and fix it for hotpug) Oliver O'Halloran
2020-04-06  3:07 ` [PATCH 1/7] powerpc/powernv/npu: Clean up compound table group initialisation Oliver O'Halloran
2020-04-06  9:48   ` Alexey Kardashevskiy
2020-06-09  5:29   ` Michael Ellerman
2020-04-06  3:07 ` [PATCH 2/7] powerpc/powernv/iov: Don't add VFs to iommu group during PE config Oliver O'Halloran
2020-04-06  9:49   ` Alexey Kardashevskiy
2020-04-06  3:07 ` [PATCH 3/7] powerpc/powernv/pci: Register iommu group at PE DMA setup Oliver O'Halloran
2020-04-06  9:51   ` Alexey Kardashevskiy
2020-04-06  3:07 ` [PATCH 4/7] powerpc/powernv/pci: Add device to iommu group during dma_dev_setup() Oliver O'Halloran
2020-04-06  9:51   ` Alexey Kardashevskiy
2020-04-06  3:07 ` [PATCH 5/7] powerpc/powernv/pci: Delete old iommu recursive iommu setup Oliver O'Halloran
2020-04-06  9:53   ` Alexey Kardashevskiy
2020-04-06  3:07 ` [PATCH 6/7] powerpc/powernv/pci: Move tce size parsing to pci-ioda-tce.c Oliver O'Halloran
2020-04-06  9:53   ` Alexey Kardashevskiy
2020-04-06  3:07 ` [PATCH 7/7] powerpc/powernv/npu: Move IOMMU group setup into npu-dma.c Oliver O'Halloran
2020-04-06  9:54   ` Alexey Kardashevskiy
2020-04-06  9:56 ` Make PowerNV IOMMU group setup saner (and fix it for hotpug) Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.